[Pacemaker] Trouble with Xen high availability. Can't get it.
Andreas Kurz
andreas.kurz at gmail.com
Mon Dec 5 22:02:51 UTC 2011
Hello,
On 12/05/2011 12:57 PM, Богомолов Дмитрий Викторович wrote:
> Hello. I made a cluster with two nodes(Ubuntu 11.10 + corocync + drbd
> + cman + Pacemaker), and configure Xen resource to start virtual
> machine (VM1 for short, Ubuntu 10.10 ), virtual machines disks are on
> the drbd resource. So now i try testing availability.
And how did you configure it? Hard to comment without seeing any
configuration.
> I execute this command on node1:
>
> $sudo crm node standby
>
> And I receive this message:
>
> block drbd1: Sending state for detaching disk failed
>
> I notice that on node1 service drbd stops
>
> $cat /proc/drbd 1: cs:Unconfigured
>
> Is this normal? There is a following:
Yes, a node in standby runs no resources.
>
> Virtual machine doesn't stop. It confirms with icmp echo response
> from the VM1. I run interactive VM1 console on node2, with :
>
> $sudo xm console VM1
>
> I can see that it continues to work, and remote ssh session with VM1
> also continues to work.
That looks like a working live-migration.
>
> Then I bring back node1 , with:
>
> $sudo crm node online
>
> I receive messages:
>
> dlm: Using TCP for comunications dlm: connecting to 1 dlm: got
> connection from 1
>
> There Icmp echo responces from VM1 stopped on 15 sec. Thus the
> interactive console VM1 on node2 and remote ssh session with VM1 too
> has shown shutdown process. I.e.there was a restart VM1 on node2,
> that as I believe shouldn't be. Further I switch off node2:
>
> $sudo crm node standby
>
> Also, I receive this message:
>
> block drbd1: Sending state for detaching disk failed
>
> I notice that on node2 service drbd stops. The interactive console
> VM1 on node2 and remote ssh session has displayed shutdown process,
> but the interactive console VM1 on node1 works normally. Thus ICMP
> echo response from the VM1 has stopped on 275с. During this time i
> cant get remote ssh connect to VM1. After this long interval Xen
> services start working . Further I switch on node2:
config???
>
> $sudo crm node online
>
> A situation similarly described earlier, i.e. icmp echo responces to
> VM1 stopped on 15 sec. Thus the interactive console VM1 on node1 and
> remote ssh session with VM1 too has shown shutdown process. I.e.
> there was a restart VM1 on node1.
>
> I have repeated this operation some times(4-5), with the same result,
> tried to add in parameters of service Xen:
>
> meta allow-migrate = "true"
>
> It doesn't changed behavior.
>
> I wonder whether this parameter, allow-migrate, is necessary in
> Active/Active configuration? It was not include on Clusters from
> scratch manual, but i saw it on other (active/passive) config's, thus
> I assume it's not nessasary, because Xen services are equally started
> on both servers. And I expect that any node failure must not stop
> services on another node. Am I think correctly?
>
What? You are starting the same VM on both nodes ... are you serious?
> So. How to avoid such reboots of VM1? And what I need to do for
> maintaining continuous working of VM1?
>
> What the reason of such various delay restoration - 15 sec node1 and
> 275 sec on node2? How to reduce them, and is better to avoid?
>
> Do i need live migration? If yes, then how to make that. I used
> parameter meta allow-migrate = "true", but it didn't influence.
>
> Whether it is because i do not configure Stonith yet?. At least this
> is my assumption.
Dual primary DRBD setup? Yes, you must use stonith.
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
>
> I will be grateful for you for any help.
> _______________________________________________ Pacemaker mailing
> list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
> http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list