[Pacemaker] Why monitor fails in my RA

Wed May 9 11:57:21 UTC 2012

Hi,

On Wed, Apr 25, 2012 at 10:41:05PM +0200, Greg wrote:
> 
> Hi,
> 
> I try to write redis resources agent working in master-slave. My

Are you aware of a pull request for one redis resource agent:

https://github.com/ClusterLabs/resource-agents/pull/37

It's been there a while, blocked mainly because it uses debian
specific daemon start/stop machinery.

Thanks,

Dejan

> configuration:
> node s1
> node s2
> primitive ip-redis ocf:heartbeat:IPaddr2 \
>         params ip="192.168.1.15" nic="eth0" cidr_netmask="24" \
>         op monitor interval="10s" timeout="30s" \
>         meta target-role="Started"
> primitive redis-server ocf:implix:redis4 \
>         op start interval="0" timeout="60s" \
>         op stop interval="0" timeout="60s" \
>         op monitor interval="5s" role="Master" timeout="60s" \
>         op monitor interval="10s" role="Slave" timeout="60s" \
>         params masterip="192.168.1.15"
> ms redis-ms redis-server \
>         meta master-max="1" master-node-max="1" clone-max="2" \
> 	clone-node-max="1" target-role="Master"
> colocation co-redis-ms inf: ip-redis redis-ms:Master
> order or-redis inf: redis-ms:promote ip-redis:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>         cluster-infrastructure="openais" \
>         no-quorum-policy="ignore" \
>         stonith-enabled="false" \
>         expected-quorum-votes="2" \
>         default-action-timeout="20s" \
>         last-lrm-refresh="1335271825" \
>         default-resource-stickiness="10"
> 
> 
> 	To simplify RA all redis nodes start as a slave (that's why I need
> to pass masterip in configuration).
> 	
> 	Script works great it promote on secondary (if master node is down)
> but only few times. In some point sometimes after 2 or after 3
> master fails (manually kill process) I get this error:
> redis-server:0_monitor_5000 (node=s1, call=16, rc=9,
> status=complete): master (failed)
> 
> My mointor function (simplified and removed overhead and added some
> comments) is:
> redis_monitor() {
> 	# I set score 10 for master 5 is for slave
>         CURSCORE=`$CRM_MASTER -G -q`
>         logger "redis_monitor: score $CURSCORE"
>         local state
>         redis_state
> 
> 	# In RET is current local redis state
>         state=$(echo "${RET}" | cut -d':' -f2 | tr -d '\r')
> 
>         if [ "${state}" = "master" ];then
>                 $CRM_MASTER -v $CRM_MASTER_SCORE # score is 10
>                 exit $OCF_RUNNING_MASTER
>         fi
> 
>         if [ "${state}" = "slave" ];then
>                 $CRM_MASTER -v $CRM_SLAVE_SCORE # score is 5
>                 exit $OCF_SUCCESS
>         fi
> 
> 	# if not slave/master so resource is failed
>         $CRM_MASTER -l reboot -D
>         if [ $CURSCORE -eq $CRM_MASTER_SCORE ];then
>                 exit $OCF_FAILED_MASTER
>         fi
> 
>         exit $OCF_NOT_RUNNING
> }
> 
> From my logs I know that monitoring function returned
> OCF_FAILED_MASTER when master is down and then this error occurred:
> redis-server:0_monitor_5000 (node=s1, call=16, rc=9,
> status=complete): master (failed)
> 
> After that failed master node is not monitored on that node until I
> run cleanup:
> #crm resource cleanup redis-server:0
> 
> 
> My questions:
> 1) What I'm doing wrong ?. How can I fix this.
> I've tried on-fail="restart" but this not helped
> 
> 2) Using older version of redis 2.3 If master failed redis is
> hanging for some time (21-24 seconds). Even I set higher timeout on
> monitor functions it still timeout after 20 seconds why?.
> (Changing default-action-timeout to higher value helped to resolve
> this but I think timeout should be enough)
> 
> 
> 
> --
> Greg
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org