[Pacemaker] Trouble with Xen high availability. Can't get it.

Mon Dec 5 22:02:51 UTC 2011

Hello,

On 12/05/2011 12:57 PM, Богомолов Дмитрий Викторович wrote:
> Hello. I made a cluster with two nodes(Ubuntu 11.10 + corocync + drbd
> + cman + Pacemaker), and configure Xen resource to start virtual
> machine (VM1 for short, Ubuntu 10.10 ), virtual machines disks are on
> the drbd resource. So now i try testing availability.

And how did you configure it? Hard to comment without seeing any
configuration.

> I execute this command on node1:
> 
> $sudo crm node standby
> 
> And I receive this message:
> 
> block drbd1: Sending state for detaching disk failed
> 
> I notice that on node1 service drbd stops
> 
> $cat /proc/drbd 1: cs:Unconfigured
> 
> Is this normal? There is a following:

Yes, a node in standby runs no resources.

> 
> Virtual machine doesn't stop. It confirms with icmp echo response
> from the VM1. I run interactive VM1 console on node2, with :
> 
> $sudo xm console VM1
> 
> I can see that it continues to work, and remote ssh session with VM1
> also continues to work.

That looks like a working live-migration.

> 
> Then I bring back node1 , with:
> 
> $sudo crm node online
> 
> I receive messages:
> 
> dlm: Using TCP for comunications dlm: connecting to 1 dlm: got
> connection from 1
> 
> There Icmp echo responces from VM1 stopped on 15 sec. Thus the
> interactive console VM1 on node2 and remote ssh session with VM1 too
> has shown shutdown process. I.e.there was a restart VM1 on node2,
> that as I believe shouldn't be. Further I switch off node2:
> 
> $sudo crm node standby
> 
> Also, I receive this message:
> 
> block drbd1: Sending state for detaching disk failed
> 
> I notice that on node2 service drbd stops. The interactive console
> VM1 on node2 and remote ssh session has displayed shutdown process,
> but the interactive console VM1 on node1 works normally. Thus ICMP
> echo response from the VM1 has stopped on 275с. During this time i
> cant get remote ssh connect to VM1. After this long interval Xen
> services start working . Further I switch on node2:

config???

> 
> $sudo crm node online
> 
> A situation similarly described earlier, i.e. icmp echo responces to
> VM1 stopped on 15 sec. Thus the interactive console VM1 on node1 and
> remote ssh session with VM1 too has shown shutdown process. I.e.
> there was a restart VM1 on node1.
> 
> I have repeated this operation some times(4-5), with the same result,
> tried to add in parameters of service Xen:
> 
> meta allow-migrate = "true"
> 
> It doesn't changed behavior.
> 
> I wonder whether this parameter, allow-migrate, is necessary in
> Active/Active configuration? It was not include on Clusters from
> scratch manual, but i saw it on other (active/passive) config's, thus
> I assume it's not nessasary, because Xen services are equally started
> on both servers. And I expect that any node failure must not stop
> services on another node. Am I think correctly?
> 

What? You are starting the same VM on both nodes ... are you serious?

> So. How to avoid such reboots of VM1? And  what I need to do for
> maintaining continuous working of VM1?
> 
> What the reason of such various delay restoration - 15 sec node1 and
> 275 sec on node2? How to reduce them, and is better to avoid?
> 
> Do i need live migration? If yes, then how to make that. I used
> parameter meta allow-migrate = "true", but it didn't influence.
> 
> Whether it is because i do not configure Stonith yet?. At least this
> is my assumption.

Dual primary DRBD setup? Yes, you must use stonith.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> I will be grateful for you for any help. 
> _______________________________________________ Pacemaker mailing
> list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
> http://bugs.clusterlabs.org