[Pacemaker] Problem in Stonith configuration

Tue Oct 18 07:00:48 UTC 2011

Hello,

Minor updates in the first requirement.
1. If a resource fails, node should reboot (through fencing mechanism) and
resources should re-start on the node.
2. If the physical link between the nodes in a cluster fails then that node
should be isolated (kind of a power down) and the resources should continue
to run on the other nodes

Apologies for the inconvenience.

Thanks and regards
Neha Chatrath

On Tue, Oct 18, 2011 at 12:08 PM, neha chatrath <nehachatrath at gmail.com>wrote:

> Hello Andreas,
>
> Thanks for the reply.
>
> So can you please suggest what Stonith plugin should I use for the
> production release of my software. I have the following system requirements:
> 1. If a node in the cluster fails, it should be reboot and resources should
> re-start on the node.
> 2. If the physical link between the nodes in a cluster fails then that node
> should be isolated (kind of a power down) and the resources should continue
> to run on the other nodes.
>
> I have different types of resources e.g. primitive, master-slave and cone
> running on my system.
>
> Thanks and regards
> Neha Chatrath
>
>
> Date: Mon, 17 Oct 2011 15:08:16 +0200
> From: Andreas Kurz <andreas at hastexo.com>
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] Problem in Stonith configuration
> Message-ID: <4E9C28C0.8070904 at hastexo.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hello,
>
>
> On 10/17/2011 12:34 PM, neha chatrath wrote:
> > Hello,
> > I am configuring a 2 node cluster with following configuration:
> >
> > *[root at MCG1 init.d]# crm configure show
> >
> > node $id="16738ea4-adae-483f-9d79-
> b0ecce8050f4" mcg2 \
> > attributes standby="off"
> >
> > node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \
> > attributes standby="off"
> >
> > primitive ClusterIP ocf:heartbeat:IPaddr \
> > params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \
> >
> > op monitor interval="40s" timeout="20s" \
> > meta target-role="Started"
> >
> > primitive app1_fencing stonith:suicide \
> > op monitor interval="90" \
> > meta target-role="Started"
> >
> > primitive myapp1 ocf:heartbeat:Redundancy \
> > op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" \
> > op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart"
> >
> > primitive myapp2 ocf:mcg:Redundancy_myapp2 \
> > op monitor interval="60" role="Master" timeout="30" on-fail="standby" \
> > op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
> >
> > primitive myapp3 ocf:mcg:red_app3 \
> > op monitor interval="60" role="Master" timeout="30" on-fail="fence" \
> > op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
> >
> > ms ms_myapp1 myapp1 \
> > meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> > notify="true"
> >
> > ms ms_myapp2 myapp2 \
> > meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> > notify="true"
> >
> > ms ms_myapp3 myapp3 \
> > meta master-max="1" master-max-node="1" clone-max="2" clone-node-max="1"
> > notify="true"
> >
> > colocation myapp1_col inf: ClusterIP ms_myapp1:Master
> >
> > colocation myapp2_col inf: ClusterIP ms_myapp2:Master
> >
> > colocation myapp3_col inf: ClusterIP ms_myapp3:Master
> >
> > order myapp1_order inf: ms_myapp1:promote ClusterIP:start
> >
> > order myapp2_order inf: ms_myapp2:promote ms_myapp1:start
> >
> > order myapp3_order inf: ms_myapp3:promote ms_myapp2:start
> >
> > property $id="cib-bootstrap-options" \
> > dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \
> > cluster-infrastructure="Heartbeat" \
> > stonith-enabled="true" \
> > no-quorum-policy="ignore"
> >
> > rsc_defaults $id="rsc-options" \
> > resource-stickiness="100" \
> > migration-threshold="3"
> > *
>
> > I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the
> > resources (myapp, myapp1 etc) gets started even on this node.
> > Following is the output of "*crm_mon -f *" command:
> >
> > *Last updated: Mon Oct 17 10:19:22 2011
>
> > Stack: Heartbeat
> > Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with
> > quorum
> > Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
> > 2 Nodes configured, unknown expected votes
> > 5 Resources configured.
> > ============
> > Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline)
>
> The cluster is waiting for a successful fencing event before starting
> all resources .. the only way to be sure the second node runs no resources.
>
> Since you are using suicide pluging this will never happen if Heartbeat
> is not started on that node. If this is only a _test_setup_ go with ssh
> or even null stonith plugin ... never use them on production systems!
>
> Regards,
> Andreas
>
>
> On Mon, Oct 17, 2011 at 4:04 PM, neha chatrath <nehachatrath at gmail.com>wrote:
>
>> Hello,
>> I am configuring a 2 node cluster with following configuration:
>>
>> *[root at MCG1 init.d]# crm configure show
>>
>> node $id="16738ea4-adae-483f-9d79-b0ecce8050f4" mcg2 \
>> attributes standby="off"
>>
>> node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \
>> attributes standby="off"
>>
>> primitive ClusterIP ocf:heartbeat:IPaddr \
>> params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \
>>
>> op monitor interval="40s" timeout="20s" \
>> meta target-role="Started"
>>
>> primitive app1_fencing stonith:suicide \
>> op monitor interval="90" \
>> meta target-role="Started"
>>
>> primitive myapp1 ocf:heartbeat:Redundancy \
>> op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" \
>> op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart"
>>
>> primitive myapp2 ocf:mcg:Redundancy_myapp2 \
>> op monitor interval="60" role="Master" timeout="30" on-fail="standby" \
>> op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
>>
>> primitive myapp3 ocf:mcg:red_app3 \
>> op monitor interval="60" role="Master" timeout="30" on-fail="fence" \
>> op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
>>
>> ms ms_myapp1 myapp1 \
>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
>> notify="true"
>>
>> ms ms_myapp2 myapp2 \
>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
>> notify="true"
>>
>> ms ms_myapp3 myapp3 \
>> meta master-max="1" master-max-node="1" clone-max="2" clone-node-max="1"
>> notify="true"
>>
>> colocation myapp1_col inf: ClusterIP ms_myapp1:Master
>>
>> colocation myapp2_col inf: ClusterIP ms_myapp2:Master
>>
>> colocation myapp3_col inf: ClusterIP ms_myapp3:Master
>>
>> order myapp1_order inf: ms_myapp1:promote ClusterIP:start
>>
>> order myapp2_order inf: ms_myapp2:promote ms_myapp1:start
>>
>> order myapp3_order inf: ms_myapp3:promote ms_myapp2:start
>>
>> property $id="cib-bootstrap-options" \
>> dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \
>> cluster-infrastructure="Heartbeat" \
>> stonith-enabled="true" \
>> no-quorum-policy="ignore"
>>
>> rsc_defaults $id="rsc-options" \
>> resource-stickiness="100" \
>> migration-threshold="3"
>> *
>> I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the
>> resources (myapp, myapp1 etc) gets started even on this node.
>> Following is the output of "*crm_mon -f *" command:
>>
>> *Last updated: Mon Oct 17 10:19:22 2011
>> Stack: Heartbeat
>> Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with
>> quorum
>> Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
>> 2 Nodes configured, unknown expected votes
>> 5 Resources configured.
>> ============
>> Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline)
>> Online: [ mcg1 ]
>> app1_fencing    (stonith:suicide):Started mcg1
>>
>> Migration summary:
>> * Node mcg1:
>> *
>> When I set "stonith_enabled" as false, then all my resources comes up.
>>
>> Can somebody help me with STONITH configuration?
>>
>> Cheers
>> Neha Chatrath
>>                           KEEP SMILING!!!!
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111018/9db34da7/attachment.htm>