[Pacemaker] [Problem]Reboot by the error of the clone resource influences the resource of other nodes.

Thu Mar 31 03:05:28 EDT 2011

Hi,

31.03.2011 04:15, renayama19661014 at ybb.ne.jp wrote:
[...]
> Node srv01 (45f985d7-e7c8-4834-b01b-16b99526672b): online
>         main_rsc        (ocf::pacemaker:Dummy) Started 
>         prmDummy1:0     (ocf::pacemaker:Dummy) Started 
>         prmPingd:0      (ocf::pacemaker:ping) Started 
> Node srv02 (ed7fdcbf-9c17-4f31-8a27-a831a6b39ed5): online
>         prmDummy1:1     (ocf::pacemaker:Dummy) Started 
>         main_rsc2       (ocf::pacemaker:Dummy) Started 
>         prmPingd:1      (ocf::pacemaker:ping) Started 
> Node srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311): online
>         prmDummy1:2     (ocf::pacemaker:Dummy) Started 
>         prmPingd:2      (ocf::pacemaker:ping) Started 
[...]
> Node srv01 (45f985d7-e7c8-4834-b01b-16b99526672b): online
> Node srv02 (ed7fdcbf-9c17-4f31-8a27-a831a6b39ed5): online
>         prmDummy1:1     (ocf::pacemaker:Dummy) Started     ---------> :1(funny)
>         prmPingd:0      (ocf::pacemaker:ping) Started      ---------> :0(funny)
> Node srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311): online
>         main_rsc        (ocf::pacemaker:Dummy) Started 
>         prmDummy1:2     (ocf::pacemaker:Dummy) Started     ---------> :2(funny)
>         prmPingd:1      (ocf::pacemaker:ping) Started      ---------> :1(funny)
>
> We think the reboot of pingd to be unnecessary in a srv02 node. 
> Is there the method how this problem is settled?

I observe this problem too (with latest 1.1 tip):
pengine unnecessarily decides to swap anonymous clone instances between
nodes when it rearranges cluster resources. This leads to all dependent
resources on that nodes to be stopped and started again.

In your case it swapped
srv02:prmPingd:1,srv03:prmPingd:2 <-> srv02:prmPingd:0,srv03:prmPingd:1

In my case I often see something like this:

Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
resource libvirtd:0#011(Started v02-c -> v02-d)
Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
resource libvirtd:1#011(Started v02-d -> v02-a)
Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
resource libvirtd:2#011(Started v02-a -> v02-b)
Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
resource libvirtd:3#011(Started v02-b -> v02-c)

I contacted Andrew about this directly some time ago (with hb_report),
but hadn't have power to raise this problem on ML (what is he actually
asked me to do) :( .

I suspect this is 1.1-specific, but this is solely a feeling.

Maybe somebody familiar with mercurial can bisect when this bug was
introduced?

Best,
Vladislav