[Pacemaker] can not handle the failcount for cloneresource using crm tool

Junko IKEDA ikedaj at intellilink.co.jp
Thu May 21 23:39:22 EDT 2009


Hi,

I filed the bug.
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2120

I could reproduce this behavior with a simple primitive resource.
There are two problems.

(1) the following syntax doesn't work well
crm(live)resource# failcount <rsc> delete <node>

(2) When using "set" switch, _all_ node's fail count would be set as 0.
crm(live)resource# failcount <rsc> set <node> 0

Thanks,
Junko




> -----Original Message-----
> From: Dejan Muhamedagic [mailto:dejanmm at fastmail.fm]
> Sent: Friday, May 22, 2009 7:53 AM
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] can not handle the failcount for cloneresource
using
> crm tool
> 
> Hi Junko-san,
> 
> On Thu, May 21, 2009 at 06:32:52PM +0900, Junko IKEDA wrote:
> > Hi,
> >
> > I have 4 nodes (dl380g5a, dl380g5b, dl380g5c, dl380g5d),
> > and run 1 clone resource with the following configuration.
> > 	clone_max"="2"
> > 	clone_node_max"="1"
> >
> > (1) Initial state
> > 	dummy:0	dl380g5a
> > 	dummy:1	dl380g5b
> >
> > (2) dummy:1 break down, and move to dl380g5c
> > 	dummy:0	dl380g5a
> > 	dummy:1	dl380g5c
> >
> > (3) dummy:1 break down again, move to dl380g5d
> > 	dummy:0	dl380g5a
> > 	dummy:1	dl380g5d
> >
> > (4) Now, the failconts for dummy:1 are;
> > 	dl380g5c = 1
> > 	dl380g5d = 1
> >
> > I tried to delete the failcount using crm.
> > But it seems that delete switch for clone resource doesn't work.
> >
> > crm(live)resource# failcount dummy:1 show dl380g5c
> > scope=status  name=fail-count-dummy:1 value=1
> > crm(live)resource# failcount dummy:1 delete dl380g5c
> 
> I can see in the logs this:
> 
> crm_attribute -N dl380g5c -n fail-count-dummy:1 -D -t status -d 0
> 
> Well, that should've deleted the failcount. Unfortunately, can't
> see anything in the logs. I think that you should file a bug.
> 
> > crm(live)resource# failcount dummy:1 show dl380g5c
> > scope=status  name=fail-count-dummy:1 value=1
> >
> > set value "0" worked.
> > crm(live)resource# failcount dummy:1 set dl380g5c 0
> > crm(live)resource# failcount dummy:1 show dl380g5c
> > scope=status  name=fail-count-dummy:1 value=0
> >
> > Is this case only in the clone resrouce?
> 
> Not sure what you mean.
> 
> > And anothre thing.
> > After set value "0",
> > The failcount was deleted not only for dl380g5c but also dl380g5d.
> 
> The set value command I can see in the logs is this:
> 
> crm_attribute -N dl380g5c -n fail-count-dummy:1 -v 0 -t status -d 0
> 
> That worked fine. In dl380g5d/pengine/pe-input-4.bz2 I can still
> see that the fail-count for dummy:1 at 5b is set to 1. Then, in
> dl380g5d/pengine/pe-input-5.bz2 it is not set to 0 but gone. I'm
> really not sure what triggered the latter transition. Andrew?
> 
> > I expected that "failcount <rsc> show _<node>_" could specify one node.
> > Is there any wrong configurations?
> 
> Sorry, you lost me here as well.
> 
> BTW, I can't find the changeset id from the hb_report in the
> repository:
> 
> CRM Version: 1.0.3 (2e35b8ac90a327c77ff869e1189fc70234213906)
> 
> Thanks,
> 
> Dejan
> 
> > See also attatched hb_report.
> >
> > Best Regards,
> > Junko Ikeda
> >
> > NTT DATA INTELLILINK CORPORATION
> >
> 
> 
> > ???(1) initial state
> > 	dummy:0	dl380g5a
> > 	dummy:1	dl380g5b
> >
> > ============
> > Last updated: Thu May 21 17:45:16 2009
> > Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition
> with quorum
> > Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906
> > 4 Nodes configured, unknown expected votes
> > 1 Resources configured.
> > ============
> >
> > Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ]
> >
> > Clone Set: clone
> >         Started: [ dl380g5a dl380g5b ]
> >
> > Operations:
> > * Node dl380g5a:
> >    dummy:0: migration-threshold=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=0 (ok)
> > * Node dl380g5d:
> > * Node dl380g5b:
> >    dummy:1: migration-threshold=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=0 (ok)
> > * Node dl380g5c:
> >
> >
> > (2) dummy:1 break down, and dummy:1 move to dl380g5c
> > 	dummy:0	dl380g5a
> > 	dummy:1	dl380g5c
> >
> > ============
> > Last updated: Thu May 21 17:46:21 2009
> > Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition
> with quorum
> > Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906
> > 4 Nodes configured, unknown expected votes
> > 1 Resources configured.
> > ============
> >
> > Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ]
> >
> > Clone Set: clone
> >         Started: [ dl380g5a dl380g5c ]
> >
> > Operations:
> > * Node dl380g5a:
> >    dummy:0: migration-threshold=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=0 (ok)
> > * Node dl380g5d:
> > * Node dl380g5b:
> >    dummy:1: migration-threshold=1 fail-count=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=7 (not running)
> >     + (5) stop: rc=0 (ok)
> > * Node dl380g5c:
> >    dummy:1: migration-threshold=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=0 (ok)
> >
> > Failed actions:
> >     dummy:1_monitor_10000 (node=dl380g5b, call=4, rc=7,
status=complete):
> not running
> >
> >
> > (3) dummy:1 break down again, and dummy:1 move to dl380g5d
> > 	dummy:0	dl380g5a
> > 	dummy:1	dl380g5d
> >
> > ============
> > Last updated: Thu May 21 17:46:51 2009
> > Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition
> with quorum
> > Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906
> > 4 Nodes configured, unknown expected votes
> > 1 Resources configured.
> > ============
> >
> > Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ]
> >
> > Clone Set: clone
> >         Started: [ dl380g5a dl380g5d ]
> >
> > Operations:
> > * Node dl380g5a:
> >    dummy:0: migration-threshold=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=0 (ok)
> > * Node dl380g5d:
> >    dummy:1: migration-threshold=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=0 (ok)
> > * Node dl380g5b:
> >    dummy:1: migration-threshold=1 fail-count=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=7 (not running)
> >     + (5) stop: rc=0 (ok)
> > * Node dl380g5c:
> >    dummy:1: migration-threshold=1 fail-count=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=7 (not running)
> >     + (5) stop: rc=0 (ok)
> >
> > Failed actions:
> >     dummy:1_monitor_10000 (node=dl380g5b, call=4, rc=7,
status=complete):
> not running
> >     dummy:1_monitor_10000 (node=dl380g5c, call=4, rc=7,
status=complete):
> not running
> >
> >
> > (4) Now, the failconts for dummy:1 are;
> > 	dl380g5c = 1
> > 	dl380g5d = 1
> >
> > ============
> > Last updated: Thu May 21 17:48:06 2009
> > Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition
> with quorum
> > Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906
> > 4 Nodes configured, unknown expected votes
> > 1 Resources configured.
> > ============
> >
> > Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ]
> >
> > Clone Set: clone
> >         Started: [ dl380g5a dl380g5d ]
> >
> > Operations:
> > * Node dl380g5a:
> >    dummy:0: migration-threshold=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=0 (ok)
> > * Node dl380g5d:
> >    dummy:1: migration-threshold=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=0 (ok)
> > * Node dl380g5b:
> >    dummy:1: migration-threshold=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=7 (not running)
> >     + (5) stop: rc=0 (ok)
> > * Node dl380g5c:
> >    dummy:1: migration-threshold=1
> >     + (3) start: rc=0 (ok)
> >     + (4) monitor: interval=10000ms rc=7 (not running)
> >     + (5) stop: rc=0 (ok)
> >
> > Failed actions:
> >     dummy:1_monitor_10000 (node=dl380g5b, call=4, rc=7,
status=complete):
> not running
> >     dummy:1_monitor_10000 (node=dl380g5c, call=4, rc=7,
status=complete):
> not running
> 
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report.tar.bz2
Type: application/octet-stream
Size: 64002 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20090522/33fa067b/attachment-0001.obj>


More information about the Pacemaker mailing list