[Pacemaker] can not handle the failcount for clone resource using crm tool
Dejan Muhamedagic
dejanmm at fastmail.fm
Thu May 21 22:53:25 UTC 2009
Hi Junko-san,
On Thu, May 21, 2009 at 06:32:52PM +0900, Junko IKEDA wrote:
> Hi,
>
> I have 4 nodes (dl380g5a, dl380g5b, dl380g5c, dl380g5d),
> and run 1 clone resource with the following configuration.
> clone_max"="2"
> clone_node_max"="1"
>
> (1) Initial state
> dummy:0 dl380g5a
> dummy:1 dl380g5b
>
> (2) dummy:1 break down, and move to dl380g5c
> dummy:0 dl380g5a
> dummy:1 dl380g5c
>
> (3) dummy:1 break down again, move to dl380g5d
> dummy:0 dl380g5a
> dummy:1 dl380g5d
>
> (4) Now, the failconts for dummy:1 are;
> dl380g5c = 1
> dl380g5d = 1
>
> I tried to delete the failcount using crm.
> But it seems that delete switch for clone resource doesn't work.
>
> crm(live)resource# failcount dummy:1 show dl380g5c
> scope=status name=fail-count-dummy:1 value=1
> crm(live)resource# failcount dummy:1 delete dl380g5c
I can see in the logs this:
crm_attribute -N dl380g5c -n fail-count-dummy:1 -D -t status -d 0
Well, that should've deleted the failcount. Unfortunately, can't
see anything in the logs. I think that you should file a bug.
> crm(live)resource# failcount dummy:1 show dl380g5c
> scope=status name=fail-count-dummy:1 value=1
>
> set value "0" worked.
> crm(live)resource# failcount dummy:1 set dl380g5c 0
> crm(live)resource# failcount dummy:1 show dl380g5c
> scope=status name=fail-count-dummy:1 value=0
>
> Is this case only in the clone resrouce?
Not sure what you mean.
> And anothre thing.
> After set value "0",
> The failcount was deleted not only for dl380g5c but also dl380g5d.
The set value command I can see in the logs is this:
crm_attribute -N dl380g5c -n fail-count-dummy:1 -v 0 -t status -d 0
That worked fine. In dl380g5d/pengine/pe-input-4.bz2 I can still
see that the fail-count for dummy:1 at 5b is set to 1. Then, in
dl380g5d/pengine/pe-input-5.bz2 it is not set to 0 but gone. I'm
really not sure what triggered the latter transition. Andrew?
> I expected that "failcount <rsc> show _<node>_" could specify one node.
> Is there any wrong configurations?
Sorry, you lost me here as well.
BTW, I can't find the changeset id from the hb_report in the
repository:
CRM Version: 1.0.3 (2e35b8ac90a327c77ff869e1189fc70234213906)
Thanks,
Dejan
> See also attatched hb_report.
>
> Best Regards,
> Junko Ikeda
>
> NTT DATA INTELLILINK CORPORATION
>
> ???(1) initial state
> dummy:0 dl380g5a
> dummy:1 dl380g5b
>
> ============
> Last updated: Thu May 21 17:45:16 2009
> Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition with quorum
> Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906
> 4 Nodes configured, unknown expected votes
> 1 Resources configured.
> ============
>
> Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ]
>
> Clone Set: clone
> Started: [ dl380g5a dl380g5b ]
>
> Operations:
> * Node dl380g5a:
> dummy:0: migration-threshold=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=0 (ok)
> * Node dl380g5d:
> * Node dl380g5b:
> dummy:1: migration-threshold=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=0 (ok)
> * Node dl380g5c:
>
>
> (2) dummy:1 break down, and dummy:1 move to dl380g5c
> dummy:0 dl380g5a
> dummy:1 dl380g5c
>
> ============
> Last updated: Thu May 21 17:46:21 2009
> Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition with quorum
> Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906
> 4 Nodes configured, unknown expected votes
> 1 Resources configured.
> ============
>
> Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ]
>
> Clone Set: clone
> Started: [ dl380g5a dl380g5c ]
>
> Operations:
> * Node dl380g5a:
> dummy:0: migration-threshold=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=0 (ok)
> * Node dl380g5d:
> * Node dl380g5b:
> dummy:1: migration-threshold=1 fail-count=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=7 (not running)
> + (5) stop: rc=0 (ok)
> * Node dl380g5c:
> dummy:1: migration-threshold=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=0 (ok)
>
> Failed actions:
> dummy:1_monitor_10000 (node=dl380g5b, call=4, rc=7, status=complete): not running
>
>
> (3) dummy:1 break down again, and dummy:1 move to dl380g5d
> dummy:0 dl380g5a
> dummy:1 dl380g5d
>
> ============
> Last updated: Thu May 21 17:46:51 2009
> Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition with quorum
> Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906
> 4 Nodes configured, unknown expected votes
> 1 Resources configured.
> ============
>
> Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ]
>
> Clone Set: clone
> Started: [ dl380g5a dl380g5d ]
>
> Operations:
> * Node dl380g5a:
> dummy:0: migration-threshold=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=0 (ok)
> * Node dl380g5d:
> dummy:1: migration-threshold=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=0 (ok)
> * Node dl380g5b:
> dummy:1: migration-threshold=1 fail-count=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=7 (not running)
> + (5) stop: rc=0 (ok)
> * Node dl380g5c:
> dummy:1: migration-threshold=1 fail-count=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=7 (not running)
> + (5) stop: rc=0 (ok)
>
> Failed actions:
> dummy:1_monitor_10000 (node=dl380g5b, call=4, rc=7, status=complete): not running
> dummy:1_monitor_10000 (node=dl380g5c, call=4, rc=7, status=complete): not running
>
>
> (4) Now, the failconts for dummy:1 are;
> dl380g5c = 1
> dl380g5d = 1
>
> ============
> Last updated: Thu May 21 17:48:06 2009
> Current DC: dl380g5d (1a7cfd3b-c885-45a3-b893-b09adb286e5c) - partition with quorum
> Version: 1.0.3-2e35b8ac90a327c77ff869e1189fc70234213906
> 4 Nodes configured, unknown expected votes
> 1 Resources configured.
> ============
>
> Online: [ dl380g5a dl380g5b dl380g5c dl380g5d ]
>
> Clone Set: clone
> Started: [ dl380g5a dl380g5d ]
>
> Operations:
> * Node dl380g5a:
> dummy:0: migration-threshold=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=0 (ok)
> * Node dl380g5d:
> dummy:1: migration-threshold=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=0 (ok)
> * Node dl380g5b:
> dummy:1: migration-threshold=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=7 (not running)
> + (5) stop: rc=0 (ok)
> * Node dl380g5c:
> dummy:1: migration-threshold=1
> + (3) start: rc=0 (ok)
> + (4) monitor: interval=10000ms rc=7 (not running)
> + (5) stop: rc=0 (ok)
>
> Failed actions:
> dummy:1_monitor_10000 (node=dl380g5b, call=4, rc=7, status=complete): not running
> dummy:1_monitor_10000 (node=dl380g5c, call=4, rc=7, status=complete): not running
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
More information about the Pacemaker
mailing list