[Pacemaker] Trouble with Xen high availability. Can't get it.

Tue Dec 6 19:41:46 EST 2011

Hello,

comments inline ...

On 12/06/2011 01:17 AM, Богомолов Дмитрий Викторович wrote:
> Hello , thanks for your answer.
> 
> 06 декабря 2011, 02:08 от Andreas Kurz <andreas.kurz at gmail.com>:
>> Hello,
>>
>> On 12/05/2011 12:57 PM, Богомолов Дмитрий Викторович wrote:
>>> Hello. I made a cluster with two nodes(Ubuntu 11.10 + corocync + drbd
>>> + cman + Pacemaker), and configure Xen resource to start virtual
>>> machine (VM1 for short, Ubuntu 10.10 ), virtual machines disks are on
>>> the drbd resource. So now i try testing availability.
>>
>> And how did you configure it? Hard to comment without seeing any
>> configuration.
> 
> $cat /etc/drbd.conf
> include "drbd.d/global_common.conf";
> include "drbd.d/*.res";
> resource clusterdata {
> 	meta-disk internal;
> 	device	/dev/drbd1;
> 	protocol C;
> 	syncer {
> 		verify-alg	sha1;
> 		rate 33M;
> 	}
> 	net {
> 		allow-two-primaries;
> 	}
> 	on blaster {
> 		disk /dev/mapper/turrel-cluster_storage;
> 		address 192.168.0.254:7789;
> 	}
> 	on turrel {
> 		disk /dev/mapper/turrel-cluster_storage;
> 		address 192.168.0.253:7789;
> 	}
> }

You must configure resource-and-stonith fencing policy in drbd to make
this working reliable.

> 
> $cat /etc/corosync/corosync.conf
> # Please read the corosync.conf.5 manual page
> compatibility: whitetank
> 
> totem {
> 	version: 2
> 	secauth: off
> 	threads: 0
> 	interface {
> 		ringnumber: 0
> 		bindnetaddr: 192.168.0.0
> 		mcastaddr: 239.0.0.1
> 		mcastport: 4000
> 	}
> }
> 
> logging {
> 	fileline: off
> 	to_stderr: off
> 	to_logfile: yes
> 	to_syslog: off
> 	logfile: /var/log/corosync/corosync.log
> 	debug: off
> 	timestamp: on
> 	logger_subsys {
> 		subsys: AMF
> 		debug: off
> 	}
> }
> 
> amf {
> 	mode: disabled
> }
> 
> service {
> 	# Load the Pacemaker Cluster Resource Manager
> 	name: 	pacemaker
> 	clustername:	tumba
> 	ver:	1
> }

So your are using cman ... then no corosync.conf should be needed.

> 
> $cat /etc/cluster/cluster.conf
> <?xml version="1.0"?>
> <cluster config_version="1" name="tumba">
>  <logging debug="off"/>
>  <clusternodes>
>   <clusternode name="blaster" nodeid="1">
>    <fence>
>     <method name="pcmk-redirect">
>      <device name="pcmk" port="blaster"/>
>     </method>
>    </fence>
>   </clusternode>
>   <clusternode name="turrel" nodeid="2">
>    <fence>
>     <method name="pcmk-redirect">
>      <device name="pcmk" port="turrel"/>
>     </method>
>    </fence>
>   </clusternode>
>  </clusternodes>
>  <fencedevices>
>   <fencedevice name="pcmk" agent="fence_pcmk"/>
>  </fencedevices>
> </cluster>
> 
> $sudo crm configure show
> node blaster \
> 	attributes standby="off"
> node turrel \
> 	attributes standby="off"
> primitive ClusterData ocf:linbit:drbd \
> 	params drbd_resource="clusterdata" \
> 	op monitor interval="60s"
> primitive ClusterFS ocf:heartbeat:Filesystem \
> 	params device="/dev/drbd/by-res/clusterdata" directory="/mnt/cluster" fstype="gfs2" \
> 	op start interval="0" timeout="60s" \
> 	op stop interval="0" timeout="60s" \
> 	op monitor interval="60s" timeout="60s"
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
> 	params ip="192.168.122.252" cidr_netmask="32" clusterip_hash="sourceip" \
> 	op monitor interval="30s"
> primitive XenDom ocf:heartbeat:Xen \
> 	params xmfile="/etc/xen/xen1.example.com.cfg" \
> 	meta is-managed="true" \
> 	utilization cores="1" mem="512" \
> 	op monitor interval="1min" timeout="30sec" start-delay="10sec" \
> 	op start interval="0" timeout="1min" \
> 	op stop interval="0" timeout="60sec" \
> 	op migrate_to interval="0" timeout="180sec"
> ms ClusterDataClone ClusterData \
> 	meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> clone ClusterFSClone ClusterFS \
> 	meta target-role="Started" is-managed="true"
> clone IP ClusterIP \
> 	meta globally-unique="true" clone-max="2" clone-node-max="2"
> clone XenDomClone XenDom \
> 	meta target-role="Started"
> location cli-prefer-ClusterFSClone ClusterFSClone \
> 	rule $id="cli-prefer-rule-ClusterFSClone" inf: #uname eq blaster and #uname eq blaster
> location prefere-blaster XenDomClone 50: blaster
> colocation XenDom-with-ClusterFS inf: XenDomClone ClusterFSClone
> colocation fs_on_drbd inf: ClusterFSClone ClusterDataClone:Master
> order ClusterFS-after-ClusterData inf: ClusterDataClone:promote ClusterFSClone:start
> order XenDom-after-ClusterFS inf: ClusterFSClone XenDomClone
> property $id="cib-bootstrap-options" \
> 	dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
> 	cluster-infrastructure="cman" \
> 	expected-quorum-votes="2" \
> 	stonith-enabled="false" \
> 	no-quorum-policy="ignore" \
> 	last-lrm-refresh="1323127127"
> rsc_defaults $id="rsc-options" \
> 	resource-stickiness="100"

Also use stonith in pacemaker to avoid data corruption.

> 
> $sudo crm_mon -1
> ============
> Last updated: Tue Dec  6 10:54:38 2011
> Stack: cman
> Current DC: blaster - partition with quorum
> Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> ============
> 
> Online: [ blaster turrel ]
> 
>  Master/Slave Set: ClusterDataClone [ClusterData]
>      Masters: [ blaster turrel ]
>  Clone Set: IP [ClusterIP] (unique)
>      ClusterIP:0	(ocf::heartbeat:IPaddr2):	Started blaster
>      ClusterIP:1	(ocf::heartbeat:IPaddr2):	Started turrel
>  Clone Set: ClusterFSClone [ClusterFS]
>      Started: [ blaster turrel ]
>  Clone Set: XenDomClone [XenDom]
>      Started: [ blaster turrel ]
> 
>>
>>> I execute this command on node1:
>>>
>>> $sudo crm node standby
>>>
>>> And I receive this message:
>>>
>>> block drbd1: Sending state for detaching disk failed
>>>
>>> I notice that on node1 service drbd stops
>>>
>>> $cat /proc/drbd 1: cs:Unconfigured
>>>
>>> Is this normal? There is a following:
>>
>> Yes, a node in standby runs no resources.
>>
>>>
>>> Virtual machine doesn't stop. It confirms with icmp echo response
>>> from the VM1. I run interactive VM1 console on node2, with :
>>>
>>> $sudo xm console VM1
>>>
>>> I can see that it continues to work, and remote ssh session with VM1
>>> also continues to work.
>>
>> That looks like a working live-migration.
>>
>>>
>>> Then I bring back node1 , with:
>>>
>>> $sudo crm node online
>>>
>>> I receive messages:
>>>
>>> dlm: Using TCP for comunications dlm: connecting to 1 dlm: got
>>> connection from 1
>>>
>>> There Icmp echo responces from VM1 stopped on 15 sec. Thus the
>>> interactive console VM1 on node2 and remote ssh session with VM1 too
>>> has shown shutdown process. I.e.there was a restart VM1 on node2,
>>> that as I believe shouldn't be. Further I switch off node2:
>>>
>>> $sudo crm node standby
>>>
>>> Also, I receive this message:
>>>
>>> block drbd1: Sending state for detaching disk failed
>>>
>>> I notice that on node2 service drbd stops. The interactive console
>>> VM1 on node2 and remote ssh session has displayed shutdown process,
>>> but the interactive console VM1 on node1 works normally. Thus ICMP
>>> echo response from the VM1 has stopped on 275с. During this time i
>>> cant get remote ssh connect to VM1. After this long interval Xen
>>> services start working . Further I switch on node2:
>>
>> config???
>>
>>>
>>> $sudo crm node online
>>>
>>> A situation similarly described earlier, i.e. icmp echo responces to
>>> VM1 stopped on 15 sec. Thus the interactive console VM1 on node1 and
>>> remote ssh session with VM1 too has shown shutdown process. I.e.
>>> there was a restart VM1 on node1.
>>>
>>> I have repeated this operation some times(4-5), with the same result,
>>> tried to add in parameters of service Xen:
>>>
>>> meta allow-migrate = "true"
>>>
>>> It doesn't changed behavior.
>>>
>>> I wonder whether this parameter, allow-migrate, is necessary in
>>> Active/Active configuration? It was not include on Clusters from
>>> scratch manual, but i saw it on other (active/passive) config's, thus
>>> I assume it's not nessasary, because Xen services are equally started
>>> on both servers. And I expect that any node failure must not stop
>>> services on another node. Am I think correctly?
>>>
>>
>> What? You are starting the same VM on both nodes ... are you serious?
> 
> Yes, the Xen resource start it in Active/Active pacemaker configuration. I dont get what is wrong? I need another way?
> I want to get High Availability Xen cluster, when one host failure does not affect users, with zero-downtime. Also, i want to get a load balansing for VM1.
> 

No, really don't do this! You run two OS on the same storage and each
instance has no knowledge about the state of the other ... please, just
think about it ...

If you really want "zero" downtime you would need something like Xen
Remus to replicate the complete state continuously to a standby instance.

For a typical HA setup you would monitor the VMs and restart it on
failure on the local ... or on node failure on the next node. Extending
this setup (depending on the application) with a service load-balancing
facility like LVS and ldirectord to balance load between _two_ VMs using
its _own_ storage

>>
>>> So. How to avoid such reboots of VM1? And  what I need to do for
>>> maintaining continuous working of VM1?
>>>
>>> What the reason of such various delay restoration - 15 sec node1 and
>>> 275 sec on node2? How to reduce them, and is better to avoid?
>>>
>>> Do i need live migration? If yes, then how to make that. I used
>>> parameter meta allow-migrate = "true", but it didn't influence.
>>>
>>> Whether it is because i do not configure Stonith yet?. At least this
>>> is my assumption.
>>
>> Dual primary DRBD setup? Yes, you must use stonith.
> 
> I'm not absolutely understand how to use it in my configuration.
> There is an example for ipmi stonith setting up in Clusters_from_Scratch/s-stonith-example.html. If I have properly understood, the example is resulted for hosts which support IPMI. My hosts don't support it.
> 
> My nodes in test cluster are: Node1 - VMware virtual machine, Node 2 - old computer with PIV 2,8 MHz + 1Gb RAM used.
> What stonith ra I should use with my config for Xen resource - fence_xenapi, fence_node, external/xen0, external/xen0-ha?
> Man is available only for fence_xenapi, and , this RA has to apply for xen-center which I don't use.
> I don't know how to receive the help manuals for others ra, because man is absent. May be you help with it.
>

For testing you can also use the ssh stonith agent in Pacemaker and for
DRBD you can use that obliterate-peer.sh script as fence-handler... you
can get it e.g. here:
https://alteeve.com/w/Obliterate-peer.sh_(DRBD_fence_handler)

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

>>
>> Regards,
>> Andreas
>>
>> --
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>>>
>>> I will be grateful for you for any help.
>>> _______________________________________________ Pacemaker mailing
>>> list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>>> http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111207/f4b27bb2/attachment-0003.sig>