[Pacemaker] pacemaker/corosync: a resource is started on 2 nodes

Mon Feb 23 18:16:39 EST 2015

> On 28 Jan 2015, at 9:20 pm, Sergey Arlashin <sergeyarl.maillist at gmail.com> wrote:
> 
> Hi!
> 
> I have a small corosync/pacemaker based cluster which consists of 4 nodes. 2 nodes are in standby mode, another 2 actually handle all the resources. 
> 
> corosync  ver. 1.4.7-1. 
> pacemaker  ver  1.1.11.
> os: ubuntu 12.04. 
> 
> Inside our production environment which has a plenty of free ram,cpu etc everything is working well. When I switch one node off all the resources move to another without any problems. And vice versa. That's what I need :)
> 
> Our staging environment has rather weak hardware (that's ok - it's just staging :) ) and is rather busy. Sometimes it even doesn't have enough cpu or disk speed to be stable. When that happens some of cluster resources fail (which I consider to be normal), but also I can see the following crm output:
> 
> Node db-node1: standby
> Node db-node2: standby
> Online: [ lb-node1 lb-node2 ]
> 
> Pgpool2	(ocf::heartbeat:pgpool):	FAILED (unmanaged) [ lb-node2 lb-node1 ]
> Resource Group: IPGroup
>     FailoverIP1	(ocf::heartbeat:IPaddr2):	Started [ lb-node2 lb-node1 ]
> 
> As you can see the resource ocf::heartbeat:IPaddr2 is started on both nodes ( lb-node2 and lb-node1 ). But I can't figure out how than could happen. 

stonith-enabled=false is one especially good way.
particularly in an unstable environment.

it could even be that it is showing up as running due to failed monitor operations and is not actually running there (but for safety we have to assume it is) 

> 
> this is the output of my crm configure show:
> 
> node db-node1 \
> 	attributes standby=on
> node db-node2 \
> 	attributes standby=on
> node lb-node1
> node lb-node2
> primitive Cachier ocf:site:cachier \
> 	op monitor interval=10s timeout=30s depth=10 \
> 	meta target-role=Started
> primitive FailoverIP1 IPaddr2 \
> 	params ip=111.22.33.44 cidr_netmask=32 iflabel=FAILOVER \
> 	op monitor interval=30s
> primitive Mailer ocf:site:mailer \
> 	meta target-role=Started \
> 	op monitor interval=10s timeout=30s depth=10
> primitive Memcached memcached \
> 	op monitor interval=10s timeout=30s depth=10 \
> 	meta target-role=Started
> primitive Nginx nginx \
> 	params status10url="/nginx_status" testclient=curl port=8091 \
> 	op monitor interval=10s timeout=30s depth=10 \
> 	op start interval=0 timeout=40s \
> 	op stop interval=0 timeout=60s \
> 	meta target-role=Started
> primitive Pgpool2 pgpool \
> 	params checkmethod=pid \
> 	op monitor interval=30s \
> 	op start interval=0 timeout=40s \
> 	op stop interval=0 timeout=60s
> group IPGroup FailoverIP1 \
> 	meta target-role=Started
> colocation ip-with-cachier inf: Cachier IPGroup
> colocation ip-with-mailer inf: Mailer IPGroup
> colocation ip-with-memcached inf: Memcached IPGroup
> colocation ip-with-nginx inf: Nginx IPGroup
> colocation ip-with-pgpool inf: Pgpool2 IPGroup
> order cachier-after-ip inf: IPGroup Cachier
> order mailer-after-ip inf: IPGroup Mailer
> order memcached-after-ip inf: IPGroup Memcached
> order nginx-after-ip inf: IPGroup Nginx
> order pgpool-after-ip inf: IPGroup Pgpool2
> property cib-bootstrap-options: \
> 	expected-quorum-votes=4 \
> 	stonith-enabled=false \
> 	default-resource-stickiness=100 \
> 	maintenance-mode=false \
> 	dc-version=1.1.10-9d39a6b \
> 	cluster-infrastructure="classic openais (with plugin)" \
> 	last-lrm-refresh=1422438144
> 
> 
> So the question is - does my config allow a resource like ocf::heartbeat:IPaddr2 to be started on multiple nodes simultaneously? Is it something that normally can happen? Or is it happening because of the shortage of computing power which i described earlier? : )
> How can I prevent a thing like this from happening? Is it a case which normally is supposed to be solved by STONITH?  
> 
> Thanks in advance.
> 
> --
> Best regards,
> Sergey Arlashin
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org