[Pacemaker] Two node KVM cluster

Tue Apr 30 01:00:15 EDT 2013

On 17/04/2013, at 4:02 PM, Oriol Mula-Valls <oriol.mula-valls at ic3.cat> wrote:

> On 16/04/13 06:10, Andrew Beekhof wrote:
>> 
>> On 10/04/2013, at 3:20 PM, Oriol Mula-Valls<oriol.mula-valls at ic3.cat>  wrote:
>> 
>>> On 10/04/13 02:10, Andrew Beekhof wrote:
>>>> 
>>>> On 09/04/2013, at 7:31 PM, Oriol Mula-Valls<oriol.mula-valls at ic3.cat>   wrote:
>>>> 
>>>>> Thanks Andrew I've managed to set up the system and currently I have it working but still on testing.
>>>>> 
>>>>> I have configure external/ipmi as fencing device and then I force a reboot doing a echo b>   /proc/sysrq-trigger. The fencing is working properly as the node is shut off and the VM migrated. However, as soon as I turn on the fenced now and the OS has started the surviving is shut down. Is it normal or am I doing something wrong?
>>>> 
>>>> Can you clarify "turn on the fenced"?
>>>> 
>>> 
>>> To restart the fenced node I do either a power on with ipmitool or I power it on using the iRMC web interface.
>> 
>> Oh, "fenced now" was meant to be "fenced node".  That makes more sense now :)
>> 
>> To answer your question, I would not expect the surviving node to be fenced when the previous node returns.
>> The network between the two is still functional?
> 
> Sorry I didn't not realised the mistake even  while writing the answer :)
> 
> IPMI network is still working between the nodes.

Ok, but what about the network corosync is using?

> 
> Thanks,
> Oriol
> 
>> 
>>> 
>>>>> 
>>>>> On the other hand I've seen that in case I completely lose power fencing obviously fails. Would SBD stonith solve this issue?
>>>>> 
>>>>> Kind regards,
>>>>> Oriol
>>>>> 
>>>>> On 08/04/13 04:11, Andrew Beekhof wrote:
>>>>>> 
>>>>>> On 03/04/2013, at 9:15 PM, Oriol Mula-Valls<oriol.mula-valls at ic3.cat>    wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I've started with Linux HA about one year ago. Currently I'm facing a new project in which I have to set up two nodes with high available virtual machines. I have used as a starting point the Digimer's tutorial (https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial).
>>>>>>> 
>>>>>>> To deploy this new infrastructure I have two Fujitsu Primergy Rx100S7. Both machines have 8GB of RAM and 2x500GB HD. I started creating a software RAID1 with the internal drives and installing Debian 7.0 (Wheezy). Apart from the O.S. partition I have created 3 more partitions, one for the shared storage between both machines with OCFS2 and the two other will be used as PVs to create LVs to support the VMs (one for the VMs that will be primary on node1 an the other for primary machines on node2). These 3 partitions are replicated using DRBD.
>>>>>>> 
>>>>>>> The shared storage folder contains:
>>>>>>> * ISO images needed when provisioning VMs
>>>>>>> * scripts used to call virt-install which handles the creation of our VMs.
>>>>>>> * XML definition files which define the emulated hardware backing the VMs
>>>>>>> * old copies of the XML definition files.
>>>>>>> 
>>>>>>> I have more or less done the configuration for the OCFS2 fs and I was about to start the configuration of cLVM for one of the VGs but I have some doubts. I have one dlm for the OCFS2 filesystem, should I create another for cLVM RA?
>>>>>> 
>>>>>> No, there should only ever be one dlm resource (cloned like you have it)
>>>>>> 
>>>>>>> 
>>>>>>> This is the current configuration:
>>>>>>> node node1
>>>>>>> node node2
>>>>>>> primitive p_dlm_controld ocf:pacemaker:controld \
>>>>>>> 	op start interval="0" timeout="90" \
>>>>>>> 	op stop interval="0" timeout="100" \
>>>>>>> 	op monitor interval="10"
>>>>>>> primitive p_drbd_shared ocf:linbit:drbd \
>>>>>>> 	params drbd_resource="shared" \
>>>>>>> 	op monitor interval="10" role="Master" timeout="20" \
>>>>>>> 	op monitor interval="20" role="Slave" timeout="20" \
>>>>>>> 	op start interval="0" timeout="240s" \
>>>>>>> 	op stop interval="0" timeout="120s"
>>>>>>> primitive p_drbd_vm_1 ocf:linbit:drbd \
>>>>>>> 	params drbd_resource="vm_1" \
>>>>>>> 	op monitor interval="10" role="Master" timeout="20" \
>>>>>>> 	op monitor interval="20" role="Slave" timeout="20" \
>>>>>>> 	op start interval="0" timeout="240s" \
>>>>>>> 	op stop interval="0" timeout="120s"
>>>>>>> primitive p_fs_shared ocf:heartbeat:Filesystem \
>>>>>>> 	params device="/dev/drbd/by-res/shared" directory="/shared" fstype="ocfs2" \
>>>>>>> 	meta target-role="Started" \
>>>>>>> 	op monitor interval="10"
>>>>>>> primitive p_ipmi_node1 stonith:external/ipmi \
>>>>>>> 	params hostname="node1" userid="admin" passwd="xxx" ipaddr="10.0.0.2" interface="lanplus"
>>>>>>> primitive p_ipmi_node2 stonith:external/ipmi \
>>>>>>> 	params hostname="node2" userid="admin" passwd="xxx" ipaddr="10.0.0.3" interface="lanplus"
>>>>>>> primitive p_libvirtd lsb:libvirt-bin \
>>>>>>> 	op monitor interval="120s" \
>>>>>>> 	op start interval="0" \
>>>>>>> 	op stop interval="0"
>>>>>>> primitive p_o2cb ocf:pacemaker:o2cb \
>>>>>>> 	op start interval="0" timeout="90" \
>>>>>>> 	op stop interval="0" timeout="100" \
>>>>>>> 	op monitor interval="10" \
>>>>>>> 	meta target-role="Started"
>>>>>>> group g_shared p_dlm_controld p_o2cb p_fs_shared
>>>>>>> ms ms_drbd_shared p_drbd_shared \
>>>>>>> 	meta master-max="2" clone-max="2" notify="true"
>>>>>>> ms ms_drbd_vm_1 p_drbd_vm_1 \
>>>>>>> 	meta master-max="2" clone-max="2" notify="true"
>>>>>>> clone cl_libvirtd p_libvirtd \
>>>>>>> 	meta globally-unique="false" interlave="true"
>>>>>>> clone cl_shared g_shared \
>>>>>>> 	meta interleave="true"
>>>>>>> location l_ipmi_node1 p_ipmi_node1 -inf: node1
>>>>>>> location l_ipmi_node2 p_ipmi_node2 -inf: node2
>>>>>>> order o_drbd_before_shared inf: ms_drbd_shared:promote cl_shared:start
>>>>>>> 
>>>>>>> Packages' versions:
>>>>>>> clvm                               2.02.95-7
>>>>>>> corosync                           1.4.2-3
>>>>>>> dlm-pcmk                           3.0.12-3.2+deb7u2
>>>>>>> drbd8-utils                        2:8.3.13-2
>>>>>>> libdlm3                            3.0.12-3.2+deb7u2
>>>>>>> libdlmcontrol3                     3.0.12-3.2+deb7u2
>>>>>>> ocfs2-tools                        1.6.4-1+deb7u1
>>>>>>> ocfs2-tools-pacemaker              1.6.4-1+deb7u1
>>>>>>> openais                            1.1.4-4.1
>>>>>>> pacemaker                          1.1.7-1
>>>>>>> 
>>>>>>> As this is my first serious set up suggestions are more than welcome.
>>>>>>> 
>>>>>>> Thanks for your help.
>>>>>>> 
>>>>>>> Oriol
>>>>>>> --
>>>>>>> Oriol Mula Valls
>>>>>>> Institut Català de Ciències del Clima (IC3)
>>>>>>> Doctor Trueta 203 - 08005 Barcelona
>>>>>>> Tel:+34 93 567 99 77
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>> 
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>> 
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Oriol Mula Valls
>>>>> Institut Català de Ciències del Clima (IC3)
>>>>> Doctor Trueta 203 - 08005 Barcelona
>>>>> Tel:+34 93 567 99 77
>>>>> 
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>> 
>>> 
>>> --
>>> Oriol Mula Valls
>>> Institut Català de Ciències del Clima (IC3)
>>> Doctor Trueta 203 - 08005 Barcelona
>>> Tel:+34 93 567 99 77
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> 
> -- 
> Oriol Mula Valls
> Institut Català de Ciències del Clima (IC3)
> Doctor Trueta 203 - 08005 Barcelona
> Tel:+34 93 567 99 77
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org