[Pacemaker] on-fail is not effective

Fri Apr 6 21:19:39 UTC 2012

----- Original Message -----
> From: "Kazunori INOUE" <inouekazu at intellilink.co.jp>
> To: "pacemaker at oss" <pacemaker at oss.clusterlabs.org>
> Cc: koichi at intellilink.co.jp
> Sent: Thursday, April 5, 2012 10:08:44 PM
> Subject: [Pacemaker]  on-fail is not effective
> 
> Hi,
> 
> I am using Pacemaker-1.1 (devel:
> 7172b7323bb72c51999ce11c6fa5d3ff0a0a4b4f).
> The setting of "on-fail" does not become effective.
> For example, it becomes default action("restart") even if it
> specifies "stop".

The resource is stopping, but if there is nothing to prevent the resource from starting again it will start after the stop action has completed. This is probably why 'restart' and 'stop' appear to have the same behavior.

-- Vossel

> [root at vm1 ~]# crm configure show | grep -A3 "primitive prmDummy1"
> primitive prmDummy1 ocf:pacemaker:Dummy \
>         op start interval="0" timeout="60s" on-fail="restart" \
>         op monitor interval="10s" timeout="60s" on-fail="stop" \
>         op stop interval="0" timeout="60s" on-fail="block"
> [root at vm1 ~]#
> [root at vm1 ~]# crm_mon -f1
> ============
> Last updated: Fri Apr  6 10:13:14 2012
> Last change: Fri Apr  6 10:12:42 2012 via cibadmin on vm1
> Stack: Heartbeat
> Current DC: vm1 (87e0eef1-0d86-4e8a-adfe-51f444a4054f) - partition
> with quorum
> Version: 1.1.7-7172b73
> 2 Nodes configured, unknown expected votes
> 1 Resources configured.
> ============
> 
> Online: [ vm1 vm2 ]
> 
>  prmDummy1      (ocf::pacemaker:Dummy): Started vm1
> 
> Migration summary:
> * Node vm1:
> * Node vm2:
> [root at vm1 ~]#
> [root at vm1 ~]# rm -f /var/run/Dummy-prmDummy1.state
> [root at vm1 ~]# crm_mon -f1
> ============
> Last updated: Fri Apr  6 10:13:33 2012
> Last change: Fri Apr  6 10:12:42 2012 via cibadmin on vm1
> Stack: Heartbeat
> Current DC: vm1 (87e0eef1-0d86-4e8a-adfe-51f444a4054f) - partition
> with quorum
> Version: 1.1.7-7172b73
> 2 Nodes configured, unknown expected votes
> 1 Resources configured.
> ============
> 
> Online: [ vm1 vm2 ]
> 
>  prmDummy1      (ocf::pacemaker:Dummy): Started vm2
> 
> Migration summary:
> * Node vm1:
>    prmDummy1: migration-threshold=1 fail-count=1
> * Node vm2:
> 
> Failed actions:
>     prmDummy1_monitor_10000 (node=vm1, call=4, rc=7,
>     status=complete): not running
> [root at vm1 ~]#
> 
> Attached gdb_pengine.log is a log of gdb at the time of monitor
> failure.
> Is it because the 2nd argument (variable 'key') of the
> find_rsc_op_entry()
> function is "prmDummy1_last_failure_0"?
> Thereby, it seems that "on-fail" cannot be identified. (L117~L205)
> 
> Best Regards,
> Kazunori INOUE
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>