[Pacemaker] Issues with Pacemaker / Corosync

Fri Dec 23 23:44:30 CET 2011

Hi,

On Friday 23 December 2011 16:03:37 Aravind M D wrote:
>   I am facing some problem wth corosync and pacemaker implementation. I
> have configured cluster on Debian squeeze, the package for corosync and
> pacemaker is installed from backports.
>   I am configuring two node cluster and i have configured one resource
> also. Below is my configuration.
>   root at nagt02a:~# crm configure show
>   node nagt02
>   node nagt02a
>   primitive icinga lsb:icinga \
>           op start interval="0" timeout="30s" \
>           op stop interval="0" timeout="30s" \
>           op monitor interval="30s" \
>           meta multiple-active="stop_start"
>   location prefer-nagt02 icinga 10: nagt02
>   property $id="cib-bootstrap-options" \
>           dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
>           cluster-infrastructure="openais" \
>           expected-quorum-votes="2" \
>           stonith-enabled="false" \
>           no-quorum-policy="ignore"
>   Problem 1: When the service is active on nagt02 and if i manually start
> the service on cgnagt02a the service is not disabling on nagt02a.

I found that it will be stopped, but not as fast as you think it will. The 
monitoring action only runs on the active resource. But every now and then (I 
think every five to ten minutes but that is configurable) the cluster checks the 
whole status and therefor also detects services running where they shouldn't.
With this you will probably find that once pacemaker sees the second icinga, it 
will shut down both to make sure and restart it on one node.

>   Problem 2: For checking I have stopped the service on nagt02 and made
> some changes on configuration files so service wont start again on nagt02.
> What i am testing is when node comes from a failover and service was not
> able to start on nagt02 it should start on nagt02a. But i am getting the
> below error.
> 
>   root at cgnagt02:~# crm_mon --one-shot
>   Online: [ cgnagt02 cgnagt02a ]
>    icinga (lsb:icinga):   Started cgnagt02 (unmanaged) FAILED
>   Failed actions:
>       icinga_monitor_30000 (node=cgnagt02, call=4, rc=6, status=complete):
> not configured
>       icinga_stop_0 (node=cgnagt02, call=5, rc=6, status=complete): not
> configured

Looks as if you making the service "not start" also made the service "not 
stop". And pacemaker won't start a service on one node which it can't shut 
down definitely on another node. Unless you configure fencing and the failed 
host gets killed by that I guess.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20111223/c5ec1614/attachment.sig>