[Pacemaker] [Problem] About the replacement of the master/slave resource.

Tue Sep 11 11:21:44 UTC 2012

On Mon, Sep 10, 2012 at 4:42 PM,  <renayama19661014 at ybb.ne.jp> wrote:
> Hi All,
>
> We confirmed movement of the trouble of the clone resource that we combined with Master/Slave resource.
>
> The master / slave resources are replaced under the influence of the trouble of the clonal resource.
>
> We confirmed it in the next procedure.
>
>
> Step1) We start a cluster and send cib.
>
> ============
> Last updated: Mon Sep 10 15:26:25 2012
> Stack: Heartbeat
> Current DC: drbd2 (08607c71-da7b-4abf-b6d5-39ee39552e89) - partition with quorum
> Version: 1.0.12-c6770b8
> 2 Nodes configured, unknown expected votes
> 6 Resources configured.
> ============
>
> Online: [ drbd1 drbd2 ]
>
>  Resource Group: grpPostgreSQLDB
>      prmApPostgreSQLDB  (ocf::pacemaker:Dummy): Started drbd1
>  Resource Group: grpStonith1
>      prmStonith1-2      (stonith:external/ssh): Started drbd2
>      prmStonith1-3      (stonith:meatware):     Started drbd2
>  Resource Group: grpStonith2
>      prmStonith2-2      (stonith:external/ssh): Started drbd1
>      prmStonith2-3      (stonith:meatware):     Started drbd1
>  Master/Slave Set: msDrPostgreSQLDB
>      Masters: [ drbd1 ]
>      Slaves: [ drbd2 ]
>  Clone Set: clnDiskd1
>      Started: [ drbd1 drbd2 ]
>  Clone Set: clnPingd
>      Started: [ drbd1 drbd2 ]
>
> Step2) We cause a monitor error in pingd.
>
> [root at drbd1 ~]# rm -rf /var/run/pingd-default_ping_set
>
> Step3) FailOver is finished.
>
> ============
> Last updated: Mon Sep 10 15:27:08 2012
> Stack: Heartbeat
> Current DC: drbd2 (08607c71-da7b-4abf-b6d5-39ee39552e89) - partition with quorum
> Version: 1.0.12-c6770b8
> 2 Nodes configured, unknown expected votes
> 6 Resources configured.
> ============
>
> Online: [ drbd1 drbd2 ]
>
>  Resource Group: grpPostgreSQLDB
>      prmApPostgreSQLDB  (ocf::pacemaker:Dummy): Started drbd2
>  Resource Group: grpStonith1
>      prmStonith1-2      (stonith:external/ssh): Started drbd2
>      prmStonith1-3      (stonith:meatware):     Started drbd2
>  Resource Group: grpStonith2
>      prmStonith2-2      (stonith:external/ssh): Started drbd1
>      prmStonith2-3      (stonith:meatware):     Started drbd1
>  Master/Slave Set: msDrPostgreSQLDB
>      Masters: [ drbd2 ]
>      Stopped: [ prmDrPostgreSQLDB:1 ]
>  Clone Set: clnDiskd1
>      Started: [ drbd1 drbd2 ]
>  Clone Set: clnPingd
>      Started: [ drbd2 ]
>      Stopped: [ prmPingd:0 ]
>
> Failed actions:
>     prmPingd:0_monitor_10000 (node=drbd1, call=14, rc=7, status=complete): not running
>
>
>
> However, Master/Slave resources seemed to be replaced when we watched log.
>
> Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Move    resource prmApPostgreSQLDB#011(Started drbd1 -> drbd2)
> Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmStonith1-2#011(Started drbd2)
> Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmStonith1-3#011(Started drbd2)
> Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmStonith2-2#011(Started drbd1)
> Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmStonith2-3#011(Started drbd1)
> Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Move    resource prmDrPostgreSQLDB:0#011(Master drbd1 -> drbd2)
> Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Stop    resource prmDrPostgreSQLDB:1#011(drbd2)
> Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmDiskd1:0#011(Started drbd1)
> Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmDiskd1:1#011(Started drbd2)
> Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Stop    resource prmPingd:0#011(drbd1)
> Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmPingd:1#011(Started drbd2)
>
> The replacement is unnecessary, and Slave becomes Master, and inoperative Master should have only to originally stop.
>
> However, this problem seems to be solved in Pacemaker1.1.
>
> Will the correction be possible for Pacemaker1.0?
> Because I have a big difference in placement processing with Pacemaker1.1, I think that the correction to Pacemaker1.0 is difficult.

You're probably right.  I will have a look soon.

>
>  * This problem may have been reported as a known problem.
>  * I registered this problem with Bugzilla.
>   * http://bugs.clusterlabs.org/show_bug.cgi?id=5103

great :)

>
> Best Regards,
> Hideo Yamauchi.
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org