[Pacemaker] master-slave resource repeats restart

Mon Oct 15 14:46:55 UTC 2012

----- Original Message -----
> From: "Kazunori INOUE" <inouekazu at intellilink.co.jp>
> To: "pacemaker at oss" <pacemaker at oss.clusterlabs.org>
> Cc: shimazakik at intellilink.co.jp
> Sent: Monday, October 15, 2012 4:21:27 AM
> Subject: [Pacemaker] master-slave resource repeats restart
> 
> Hi,
> 
> I am using Pacemaker-1.1.
> - pacemaker f722cf1ff9 (2012 Oct10)
> - corosync  dc7002195a (2012 Oct11)
> 
> If monitor (_on-fail is stop_) of a master resource fails, the
> resource
> repeats restart in other node.

Weird, so we stop the resource on all nodes, but then recover it on the nodes that didn't have the failure.  That doesn't seem right. Please open a new issue in bugs.clusterlabs.org for this.

-- Vossel

> [test case]
> 1. use StatefulRA which set on-fail="stop" of Master's monitor.
> 
>   [configuration of master-slave resource]
>    ms msAP prmAP \
>      meta master-max="1" master-node-max="1" \
>           clone-max="2" clone-node-max="1"
> 
>    primitive prmAP ocf:pacemaker:Stateful \
>       :
>      op monitor role="Master" interval="10s" timeout="20s"
>      on-fail="stop" \
>       :
> 
>    # crm_mon -rfA1
>    Last updated: Mon Oct 15 16:09:57 2012
>    Last change: Mon Oct 15 16:09:49 2012 via cibadmin on vm5
>    Stack: corosync
>    Current DC: vm5 (2439358656) - partition with quorum
>    Version: 1.1.8-f722cf1
>    2 Nodes configured, unknown expected votes
>    4 Resources configured.
> 
> 
>    Online: [ vm5 vm6 ]
> 
>    Full list of resources:
> 
>     Master/Slave Set: msAP [prmAP]
>         Masters: [ vm5 ]
>         Slaves: [ vm6 ]
>     Clone Set: clnPingd [prmPingd]
>         Started: [ vm5 vm6 ]
> 
>    Node Attributes:
>    * Node vm5:
>        + default_ping_set                  : 100
>        + master-prmAP                      : 10
>    * Node vm6:
>        + default_ping_set                  : 100
>        + master-prmAP                      : 5
> 
>    Migration summary:
>    * Node vm5:
>    * Node vm6:
> 
> 2. let the master resource on vm5 fail,
> 
>    # echo a >> /var/run/Stateful-prmAP.state
> 
>    then the master-slave resource repeats restart on vm6.
>    the state of the following (a)~(c) is repeated.
> 
>   (a)
>    Full list of resources:
> 
>     Master/Slave Set: msAP [prmAP]
>         Stopped: [ prmAP:0 prmAP:1 ]
>     Clone Set: clnPingd [prmPingd]
>         Started: [ vm5 vm6 ]
> 
>    Node Attributes:
>    * Node vm5:
>        + default_ping_set                  : 100
>    * Node vm6:
>        + default_ping_set                  : 100
> 
>   (b)
>    Full list of resources:
> 
>     Master/Slave Set: msAP [prmAP]
>         Slaves: [ vm6 ]
>         Stopped: [ prmAP:1 ]
>     Clone Set: clnPingd [prmPingd]
>         Started: [ vm5 vm6 ]
> 
>    Node Attributes:
>    * Node vm5:
>        + default_ping_set                  : 100
>    * Node vm6:
>        + default_ping_set                  : 100
>        + master-prmAP                      : 5
> 
>   (c)
>    Full list of resources:
> 
>     Master/Slave Set: msAP [prmAP]
>         Masters: [ vm6 ]
>         Stopped: [ prmAP:1 ]
>     Clone Set: clnPingd [prmPingd]
>         Started: [ vm5 vm6 ]
> 
>    Node Attributes:
>    * Node vm5:
>        + default_ping_set                  : 100
>    * Node vm6:
>        + default_ping_set                  : 100
>        + master-prmAP                      : 10
> 
> Best Regards,
> Kazunori INOUE
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>