[Pacemaker] About replacement of clone and handling of the fail number of times.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Fri Mar 12 06:30:47 UTC 2010
Hi,
We tested the trouble of the clone.
I confirmed it in the next procedure.
Step1)I start all nodes and update cib.xml.
============
Last updated: Fri Mar 12 14:53:38 2010
Stack: openais
Current DC: srv01 - partition with quorum
Version: 1.0.7-049006f172774f407e165ec82f7ee09cb73fd0e7
4 Nodes configured, 2 expected votes
13 Resources configured.
============
Online: [ srv01 srv02 srv03 srv04 ]
Resource Group: UMgroup01
UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
UmIPaddr (ocf::heartbeat:Dummy): Started srv01
UmDummy01 (ocf::heartbeat:Dummy): Started srv01
UmDummy02 (ocf::heartbeat:Dummy): Started srv01
Resource Group: OVDBgroup02-1
prmExPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-1 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-2 (ocf::heartbeat:Dummy): Started srv01
prmFsPostgreSQLDB1-3 (ocf::heartbeat:Dummy): Started srv01
prmIpPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
prmApPostgreSQLDB1 (ocf::heartbeat:Dummy): Started srv01
Resource Group: OVDBgroup02-2
prmExPostgreSQLDB2 (ocf::heartbeat:Dummy): Started srv02
prmFsPostgreSQLDB2-1 (ocf::heartbeat:Dummy): Started srv02
prmFsPostgreSQLDB2-2 (ocf::heartbeat:Dummy): Started srv02
prmFsPostgreSQLDB2-3 (ocf::heartbeat:Dummy): Started srv02
prmIpPostgreSQLDB2 (ocf::heartbeat:Dummy): Started srv02
prmApPostgreSQLDB2 (ocf::heartbeat:Dummy): Started srv02
Resource Group: OVDBgroup02-3
prmExPostgreSQLDB3 (ocf::heartbeat:Dummy): Started srv03
prmFsPostgreSQLDB3-1 (ocf::heartbeat:Dummy): Started srv03
prmFsPostgreSQLDB3-2 (ocf::heartbeat:Dummy): Started srv03
prmFsPostgreSQLDB3-3 (ocf::heartbeat:Dummy): Started srv03
prmIpPostgreSQLDB3 (ocf::heartbeat:Dummy): Started srv03
prmApPostgreSQLDB3 (ocf::heartbeat:Dummy): Started srv03
Resource Group: grpStonith1
prmStonithN1 (stonith:external/ssh): Started srv04
Resource Group: grpStonith2
prmStonithN2 (stonith:external/ssh): Started srv01
Resource Group: grpStonith3
prmStonithN3 (stonith:external/ssh): Started srv02
Resource Group: grpStonith4
prmStonithN4 (stonith:external/ssh): Started srv03
Clone Set: clnUMgroup01
Started: [ srv01 srv04 ]
Clone Set: clnPingd
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnDiskd1
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnG3dummy1
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnG3dummy2
Started: [ srv01 srv02 srv03 srv04 ]
Step2)I generate the trouble of the clnUMgroup01 clone in a N1(srv01) node.
[root at srv01 ~]# rm -rf /var/run/heartbeat/rsctmp/Dummy-clnUMdummy02\:0.state
* The clone resources are replaced.
[root at srv01 ~]# ls /var/run/heartbeat/rsctmp/Dummy-clnUMdummy0*
/var/run/heartbeat/rsctmp/Dummy-clnUMdummy01:1.state
/var/run/heartbeat/rsctmp/Dummy-clnUMdummy02:1.state
Step3)Again...I generate the trouble of the clnUMgroup01 clone in a N1(srv01) node.
[root at srv01 ~]# rm -rf /var/run/heartbeat/rsctmp/Dummy-clnUMdummy02\:1.state
[root at srv01 ~]# ls /var/run/heartbeat/rsctmp/Dummy-clnUMdummy0*
/var/run/heartbeat/rsctmp/Dummy-clnUMdummy01:0.state
/var/run/heartbeat/rsctmp/Dummy-clnUMdummy02:0.state
* The clone resources are replaced.
============
Last updated: Fri Mar 12 14:56:19 2010
Stack: openais
Current DC: srv01 - partition with quorum
Version: 1.0.7-049006f172774f407e165ec82f7ee09cb73fd0e7
4 Nodes configured, 2 expected votes
13 Resources configured.
============
Online: [ srv01 srv02 srv03 srv04 ]
(snip)
Migration summary:
* Node srv02:
* Node srv03:
* Node srv04:
* Node srv01:
clnUMdummy02:0: migration-threshold=5 fail-count=1
clnUMdummy02:1: migration-threshold=5 fail-count=1
Step4)I generate the trouble of the clnUMgroup01 clone in a N4(srv04) node.
[root at srv04 ~]# rm -rf /var/run/heartbeat/rsctmp/Dummy-clnUMdummy02\:1.state
[root at srv04 ~]# ls /var/run/heartbeat/rsctmp/Dummy-clnUMdummy02*
/var/run/heartbeat/rsctmp/Dummy-clnUMdummy02:1.state
* The clone resources are not replaced.
Step5)Again...I generate the trouble of the clnUMgroup01 clone in a N4(srv04) node.
* The clone resources are not replaced.
(snip)
Migration summary:
* Node srv02:
* Node srv03:
* Node srv04:
clnUMdummy02:1: migration-threshold=5 fail-count=2
* Node srv01:
clnUMdummy02:0: migration-threshold=5 fail-count=1
clnUMdummy02:1: migration-threshold=5 fail-count=1
Step6)Again...I generate the trouble of the clnUMgroup01 clone in a N4(srv04) node and N1(srv01) node.
* In the N4 node, trouble of clnUMdummy02 is handled at five times, but, in the N1 node, it is
processed at much number of times for replacement.
(snip)
Clone Set: clnUMgroup01
Started: [ srv01 ]
Stopped: [ clnUmResource:1 ]
Clone Set: clnPingd
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnDiskd1
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnG3dummy1
Started: [ srv01 srv02 srv03 srv04 ]
Clone Set: clnG3dummy2
Started: [ srv01 srv02 srv03 srv04 ]
Migration summary:
* Node srv02:
* Node srv03:
* Node srv04:
clnUMdummy02:1: migration-threshold=5 fail-count=5
* Node srv01:
clnUMdummy02:0: migration-threshold=5 fail-count=3
clnUMdummy02:1: migration-threshold=5 fail-count=3
Of a clone rising in a N1(srv01) node at the time of "globally-unique=false" is replacing it right?
In addition, is it right movement that replacement does not happen even if a clone breaks down in a
N4(srv04) node?
We think that, furthermore, there is a problem because the replacement is different.
When it was assumed that the replacement of this clone is right, arrival to the trouble number of
times is different from a N4(srv04) node in a N1(srv01) node.
By this movement, we cannot set the limit of the trouble number of times of the clone well.
This is specifications or bug? (Or is it already solved in the development version?)
Is setting to operate definitely necessary for cib.xml?
If there is a setting method of right cib.xml, please teach it.
Because the size of the collection of hb_report result is big, I do not attach it.
If there is information of hb_report which is necessary for the solution of the problem, give me
comments.
Best Regards,
Hideo Yamauchi.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cib.zip
Type: application/x-zip-compressed
Size: 10575 bytes
Desc: 1726557689-cib.zip
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100312/c1b456b9/attachment-0003.bin>
More information about the Pacemaker
mailing list