[Pacemaker] Very strange behavior on asymmetric cluster

Mon Mar 14 16:33:10 EDT 2011

14.03.2011 23:07, Arthur B. Olsen:
> If a mysql server is running on a cluster node which is not defined to 
> run the mysql resource, pacemaker will mark it unmanaged and will not 
> start it on the node which it is suppose to run on. Same goes for 
> nfs-common. On my nfs servers nfs-common and nfs-kernel-server 
> resources should be running, and all others have nfs-common installed. 
> So pacemaker will just pick one random node marking the nfs-common 
> resource as running unmanaged and will not start it where i 
> specifically told it to run.
>
> Likewise i can not have two drbd raid on different pair of node with 
> the same name. My two nfs servers hava a drbd raid between them and my 
> mysql servers hava a drbd raid running between them. Both had their 
> resources called r0 and pacemaker one to be slave and one to be 
> master, completely disregarding my location rules. Same with the mount 
> point. I can't use the same folder name  on both sql and nfs server to 
> mount the drbd0 disk in, because pacemaker will concider it mounted 
> and not try to mount the second. Changing the names of the resources 
> and the mount point solved the drbd issues.
>
> Right now a mysql process is running as test on a web server in the 
> cluster, and pacemaker will not start it on my sql servers, same for 
> my nfs servers.
>
> What i dont understand i why is pacemaker trying to monitor service on 
> nodes that are not supposed to run the service. And why does it stop 
> the service on the node that are supposed to run the service.

Resources come unmanaged because you have fencing disabled and resource 
agent fails to "monitor" and "stop" on some node where it is not needed 
at all.

You have not a way to tell the cluster that it is not supposed to run a 
service on some nodes. I believe this is a pacemaker's deficiency.

Currently, you have two ways:

1. Delete resource agents from those servers which are not supposed to 
run it.

2. Or make sure those "unused" resource agents return 5 "not installed" 
for monitor action. If they return anything else, you have your trouble.

You may also divide your cluster into two or three independent clusters, 
one per resource group.

--
Pavel Levshin