[Pacemaker] [Problem]The monitor that start-delay is long does not stop.

Tue Oct 12 06:01:58 UTC 2010

Hi Andrew,

> > Funnily enough I was just looking at that message and saw that the
> > code relevant to this one looked wrong too.
> > 
> > I believe this should fix the issue:
> >    http://hg.clusterlabs.org/pacemaker/1.1/rev/e06810256413
> > 
> > >
> > > I registered log and more with Bugzilla.
> > >
> > > &#65533;* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2505
> > 
> > Oops, I didn't see that. I should have included the bug number in the commit :-(
> 
> ok.
> I confirm your revision.

Confirmation has been late.

I confirmed that a problem was solved in your revision. 

In addition, I added a similar revision for 1.0 and confirmed that a problem was broken off.

I added comment to Bugzilla.

Best Regards,
Hideo Yamauchi.

--- renayama19661014 at ybb.ne.jp wrote:

> Hi Andrew,
> 
> Thank you for comment.
> 
> > Funnily enough I was just looking at that message and saw that the
> > code relevant to this one looked wrong too.
> > 
> > I believe this should fix the issue:
> >    http://hg.clusterlabs.org/pacemaker/1.1/rev/e06810256413
> > 
> > >
> > > I registered log and more with Bugzilla.
> > >
> > > &#65533;* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2505
> > 
> > Oops, I didn't see that. I should have included the bug number in the commit :-(
> 
> ok.
> I confirm your revision.
> 
> Best Regards,
> Hideo Yamauchi.
> 
> --- Andrew Beekhof <andrew at beekhof.net> wrote:
> 
> > On Thu, Oct 7, 2010 at 8:39 AM,  <renayama19661014 at ybb.ne.jp> wrote:
> > > Hi,
> > >
> > > I operated the next to confirm the contribution of the mailing list.
> > >
> > > &#65533;* http://www.gossamer-threads.com/lists/linuxha/pacemaker/66939
> > >
> > >
> > > Step1) I prepare cib.xml having monitor which set start-delay than five minutes..
> > > Step2) I start two nodes and send cib.
> > >
> > > ============
> > > Last updated: Thu Oct &#65533;7 14:58:09 2010
> > > Stack: Heartbeat
> > > Current DC: srv02 (1f8dd092-d82b-47eb-86c4-e011a2cd11b3) - partition WITHOUT quorum
> > > Version: 1.0.9-860b32388908c6a345786d4ecd2e2a3bec780dd2
> > > 2 Nodes configured, unknown expected votes
> > > 1 Resources configured.
> > > ============
> > >
> > > Online: [ srv01 srv02 ]
> > >
> > > &#65533;Resource Group: grpDummy
> > > &#65533; &#65533; prmFsPostgreSQLDB1-3 &#65533; &#65533; &#65533; (ocf::heartbeat:Dummy):
> Started
> srv01
> > > &#65533; &#65533; prmIpPostgreSQLDB2 (ocf::heartbeat:Dummy): Started srv01
> > >
> > > Step3) I causes the monitor error of the resource successively.
> > >
> > > ============
> > > Last updated: Thu Oct &#65533;7 15:20:01 2010
> > > Stack: Heartbeat
> > > Current DC: srv02 (d3fe8b08-20d9-4990-aebb-56a0675af5bd) - partition WITHOUT quorum
> > > Version: 1.0.9-860b32388908c6a345786d4ecd2e2a3bec780dd2
> > > 2 Nodes configured, unknown expected votes
> > > 1 Resources configured.
> > > ============
> > >
> > > Online: [ srv01 srv02 ]
> > >
> > > &#65533;Resource Group: grpDummy
> > > &#65533; &#65533; prmFsPostgreSQLDB1-3 &#65533; &#65533; &#65533; (ocf::heartbeat:Dummy):
> Started
> srv02
> > > &#65533; &#65533; prmIpPostgreSQLDB2 (ocf::heartbeat:Dummy): Started srv02
> > >
> > > Migration summary:
> > > * Node srv02:
> > > * Node srv01:
> > > &#65533; prmIpPostgreSQLDB2: migration-threshold=1 fail-count=1
> > > &#65533; prmFsPostgreSQLDB1-3: migration-threshold=1 fail-count=1
> > >
> > > Failed actions:
> > > &#65533; &#65533;prmIpPostgreSQLDB2_monitor_60000 (node=srv01, call=7, rc=7,
> status=complete): not
> running
> > > &#65533; &#65533;prmFsPostgreSQLDB1-3_monitor_30000 (node=srv01, call=5, rc=7,
> status=complete):
> not running
> > >
> > > Step4) The resource does fail-over in a srv02 node, but the monitor &#65533;of srv01 does
> not
> stop.
> > >
> > > [root at srv01 ~]# !tail
> > > tail -f /var/log/ha-log
> > > Oct &#65533;7 15:27:27 srv01 lrmd: [15792]: debug: rsc:prmFsPostgreSQLDB1-3:5: monitor
> > > Oct &#65533;7 15:27:27 srv01 Dummy[16572]: DEBUG: prmFsPostgreSQLDB1-3 monitor : 7
> > > Oct &#65533;7 15:27:58 srv01 lrmd: [15792]: debug: rsc:prmFsPostgreSQLDB1-3:5: monitor
> > > Oct &#65533;7 15:27:58 srv01 Dummy[16594]: DEBUG: prmFsPostgreSQLDB1-3 monitor : 7
> > > Oct &#65533;7 15:27:59 srv01 lrmd: [15792]: debug: rsc:prmIpPostgreSQLDB2:8: monitor
> > > Oct &#65533;7 15:27:59 srv01 Dummy[16601]: DEBUG: prmIpPostgreSQLDB2 monitor : 7
> > > Oct &#65533;7 15:27:59 srv01 lrmd: [15792]: debug: rsc:prmIpPostgreSQLDB2:7: monitor
> > > Oct &#65533;7 15:27:59 srv01 Dummy[16608]: DEBUG: prmIpPostgreSQLDB2 monitor : 7
> > > Oct &#65533;7 15:28:28 srv01 lrmd: [15792]: debug: rsc:prmFsPostgreSQLDB1-3:5: monitor
> > > Oct &#65533;7 15:28:28 srv01 Dummy[16628]: DEBUG: prmFsPostgreSQLDB1-3 monitor : 7
> > >
> > > Step5) The fail-count does strange increase afterwards.
> > >
> > > ============
> > > Last updated: Thu Oct &#65533;7 15:31:21 2010
> > > Stack: Heartbeat
> > > Current DC: srv02 (d3fe8b08-20d9-4990-aebb-56a0675af5bd) - partition WITHOUT quorum
> > > Version: 1.0.9-860b32388908c6a345786d4ecd2e2a3bec780dd2
> > > 2 Nodes configured, unknown expected votes
> > > 1 Resources configured.
> > > ============
> > >
> > > Online: [ srv01 srv02 ]
> > >
> > > &#65533;Resource Group: grpDummy
> > > &#65533; &#65533; prmFsPostgreSQLDB1-3 &#65533; &#65533; &#65533; (ocf::heartbeat:Dummy):
> Started
> srv02
> > > &#65533; &#65533; prmIpPostgreSQLDB2 (ocf::heartbeat:Dummy): Started srv02
> > >
> > > Migration summary:
> > > * Node srv02:
> > > * Node srv01:
> > > &#65533; prmIpPostgreSQLDB2: migration-threshold=1 fail-count=2
> > > &#65533; prmFsPostgreSQLDB1-3: migration-threshold=1 fail-count=1
> > >
> > > Failed actions:
> > > &#65533; &#65533;prmIpPostgreSQLDB2_monitor_60000 (node=srv01, call=8, rc=7,
> status=complete): not
> running
> > > &#65533; &#65533;prmFsPostgreSQLDB1-3_monitor_30000 (node=srv01, call=5, rc=7,
> status=complete):
> not running
> > >
> > >
> > > The next report may be related.
> > >
> > > &#65533;* http://www.gossamer-threads.com/lists/linuxha/pacemaker/66939
> > 
> > Funnily enough I was just looking at that message and saw that the
> > code relevant to this one looked wrong too.
> > 
> > I believe this should fix the issue:
> >    http://hg.clusterlabs.org/pacemaker/1.1/rev/e06810256413
> > 
> > >
> > > I registered log and more with Bugzilla.
> > >
> > > &#65533;* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2505
> > 
> > Oops, I didn't see that. I should have included the bug number in the commit :-(
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>