[Pacemaker] resource is too active problem in a 2-node cluster

Andrew Beekhof andrew at beekhof.net
Wed Feb 19 18:13:41 EST 2014


On 20 Feb 2014, at 8:29 am, Aggarwal, Ajay <aaggarwal at verizon.com> wrote:

> I suspected monitor action myself, based on error messages. But the script is either returning OCF_SUCCESS (0) or OCF_NOT_RUNNING (7) for monitor action. I ran it manually too to confirm.

I'm going to have to say otherwise:

> Feb 04 11:27:38 [45168] gol-5-7-0       crmd:  warning: status_from_rc:     Action 8 (GOL-HA_monitor_0) on gol-5-7-6 failed (target: 7 vs. rc: 1): Error

This indicates the agent returned an error (1).

> ________________________________________
> From: Andrew Beekhof [andrew at beekhof.net]
> Sent: Monday, February 17, 2014 6:46 PM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] resource is too active problem in a 2-node cluster
> 
> On 18 Feb 2014, at 5:33 am, Ajay Aggarwal <aaggarwal at verizon.com> wrote:
> 
>> Thanks Andrew for pointing towards the OCF resource agent's list of "must implement" actions. I noticed that our OCF script only implements start, stop and monitor. It does not implement meta-data and validate-all.  Could this error be a result of these un-implemented actions?
> 
> Unlikely. More likely the monitor action is not correctly returning OCF_NOT_RUNNING if run before the resource is running.
> 
>> On 02/16/2014 09:15 PM, Andrew Beekhof wrote:
>>> On 12 Feb 2014, at 1:39 am, Ajay Aggarwal <aaggarwal at verizon.com>
>>> wrote:
>>> 
>>> 
>>>> Yes, we have cman (version: cman-3.0.12.1-49). We use manual fencing ( I know it is not recommended).  There is an external monitoring and fencing service that we use (our own).
>>>> 
>>>> Perhaps subject line "resource is too active problem in a 2-node cluster" was misleading. Real problem is that resource is *NOT* too active, but pacemaker thinks it is.
>>>> 
>>> It only thinks what the resource agent tells us.
>>> Sounds like script.sh isn't OCF compliant.
>>> 
>>> 
>>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_actions.html
>>> 
>>> 
>>> 
>>>> Which leads to undesirable recovery procedure. See log lines below
>>>> 
>>>> Feb 04 11:27:38 [45167] gol-5-7-0    pengine:  warning: unpack_rsc_op:     Processing failed op monitor for GOL-HA on gol-5-7-0: unknown error (1)
>>>> Feb 04 11:27:38 [45167] gol-5-7-0    pengine:  warning: unpack_rsc_op:     Processing failed op monitor for GOL-HA on gol-5-7-6: unknown error (1)
>>>> Feb 04 11:27:38 [45167] gol-5-7-0    pengine:    error: native_create_actions:     Resource GOL-HA (ocf::script.sh) is active on 2 nodes attempting recovery
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 02/10/2014 09:43 PM, Digimer wrote:
>>>> 
>>>>> On 10/02/14 09:13 PM, Aggarwal, Ajay wrote:
>>>>> 
>>>>>> I have a 2 node cluster with no-quorum-policy=ignore. I call these nodes as node-0 and node-1. In addition, I have two cluster resources in a group; an IP-address and an OCF script.
>>>>>> 
>>>>> Turning off quorum on a 2-node cluster is fine, in fact, it's required. However, that makes stonith all the more important. Without stonith, in any cluster but in particualr on two node clusters, things will not work right.
>>>>> 
>>>>> First and foremost; Configure stonith and test to make sure it works.
>>>>> 
>>>>> 
>>>>>>   Pacemaker version: 1.1.10
>>>>>>   Corosync version: 1-4.1-15
>>>>>>   OS: CentOS 6.4
>>>>>> 
>>>>> With CentOS/RHEL 6, you need cman as well. Please be sure to also configure fence_pcmk in cluster.conf to "hook" it into pacemaker's real fencing.
>>>>> 
>>>>> 
>>>>>> What am I doing wrong?
>>>>>> 
>>>>> <snip>
>>>>> 
>>>>>>        <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
>>>>>> 
>>>>> That. :)
>>>>> 
>>>>> Once you have stonith working, see if the problem remains.
>>>>> 
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list:
>>>> Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> 
>>>> Project Home:
>>>> http://www.clusterlabs.org
>>>> 
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> 
>>>> Bugs:
>>>> http://bugs.clusterlabs.org
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list:
>>> Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> 
>>> Project Home:
>>> http://www.clusterlabs.org
>>> 
>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> 
>>> Bugs:
>>> http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140220/95c275b3/attachment-0003.sig>


More information about the Pacemaker mailing list