[Pacemaker] How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?

Andreas Kurz andreas at hastexo.com
Tue Mar 20 17:22:34 UTC 2012


On 03/20/2012 04:14 PM, Mathias Nestler wrote:
> Hi Dejan,
> 
> On 20.03.2012, at 15:25, Dejan Muhamedagic wrote:
> 
>> Hi,
>>
>> On Tue, Mar 20, 2012 at 08:52:39AM +0100, Mathias Nestler wrote:
>>> On 19.03.2012, at 20:26, Florian Haas wrote:
>>>
>>>> On Mon, Mar 19, 2012 at 8:14 PM, Mathias Nestler
>>>> <mathias.nestler at barzahlen.de <mailto:mathias.nestler at barzahlen.de>>
>>>> wrote:
>>>>> Hi everyone,
>>>>>
>>>>> I am trying to setup an active/passive (2 nodes) Linux-HA cluster
>>>>> with corosync and pacemaker to hold a PostgreSQL-Database up and
>>>>> running. It works via DRBD and a service-ip. If node1 fails, node2
>>>>> should take over. The same if PG runs on node2 and it fails.
>>>>> Everything works fine except the STONITH thing.
>>>>>
>>>>> Between the nodes is an dedicated HA-connection (10.10.10.X), so I
>>>>> have the following interface configuration:
>>>>>
>>>>> eth0                        eth1                   host
>>>>> 10.10.10.251    172.10.10.1     node1
>>>>> 10.10.10.252    172.10.10.2     node2
>>>>>
>>>>> Stonith is enabled and I am testing with a ssh-agent to kill nodes.
>>>>>
>>>>> crm configure property stonith-enabled=true
>>>>> crm configure property stonith-action=poweroff
>>>>> crm configure rsc_defaults resource-stickiness=100
>>>>> crm configure property no-quorum-policy=ignore
>>>>>
>>>>> crm configure primitive stonith_postgres stonith:external/ssh \
>>>>>              params hostlist="node1 node2"
>>>>> crm configure clone fencing_postgres stonith_postgres
>>>>
>>>> You're missing location constraints, and doing this with 2 primitives
>>>> rather than 1 clone is usually cleaner. The example below is for
>>>> external/libvirt rather than external/ssh, but you ought to be able to
>>>> apply the concept anyhow:
>>>>
>>>> http://www.hastexo.com/resources/hints-and-kinks/fencing-virtual-cluster-nodes
>>>>
>>>
>>> As is understood the cluster decides which node has to be stonith'ed.
>>> Besides this, I already tried the following configuration:
>>>
>>> crm configure primitive stonith1_postgres stonith:ssh \
>>> params hostlist="node1"
>>> op monitor interval="25" timeout="10"
>>> crm configure primitive stonith2_postgres stonith:ssh \
>>> params hostlist="node2"
>>> op monitor interval="25" timeout="10"
>>> crm configure location stonith1_not_on_node1 stonith1_postgres \
>>> -inf: node1
>>> crm configure location stonith2_not_on_node2 stonith2_postgres \
>>> -inf: node2
>>>
>>> The result is the same :/
>>
>> Neither ssh nor external/ssh are supported fencing options. Both
>> include a sleep before reboot which makes the window in which
>> it's possible for both nodes to fence each other larger than it
>> is usually the case with production quality stonith plugins.
> 
> I use this ssh-stonith only for testing. At the moment I am creating the
> cluster in a virtual environment. Besides this, what is the difference
> between ssh and external/ssh?

the first one is a binary implementation, the second one is a simple
shell script ... that's it ;-)

> My problem is, that each node tries to kill the other. But I only want
> to kill the node with the postgres resource on it if connection between
> nodes breaks.

That is the expected behavior if you introduce a split-brain in a two
node cluster. Each node builds its own cluster partition and tries to
stonith the other "dead" node.

If you are using a virtualization environment managed by libvirt you can
follow the link Florian posted. If you are running on some VMware or
Virtualbox testing environment using sbd for fencing might be a good
option ... as shared storage can be provided easily.

Then you could also do a weak colocation of the one sbd stonith agent
instance with your postgres instance and in combination with the correct
start-timeout you can get the behavior you want.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
>>
>> As for the configuration, I'd rather use the first one, just not
>> cloned. That also helps prevent mutual fencing.
>>
> 
> I cloned it because I also want the STONITH-feature if postgres lives on
> the other node. How can I achieve it?
> 
>> See also:
>>
>> http://www.clusterlabs.org/doc/crm_fencing.html
>> http://ourobengr.com/ha
>>
> 
> Thank you very much
> 
> Best
> Mathias
> 
>> Thanks,
>>
>> Dejan
>>
>>>> Hope this helps.
>>>> Cheers,
>>>> Florian
>>>>
>>>
>>> Best
>>> Mathias
>>>
>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> <mailto:Pacemaker at oss.clusterlabs.org>
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> <mailto:Pacemaker at oss.clusterlabs.org>
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120320/ab4e7d59/attachment-0004.sig>


More information about the Pacemaker mailing list