[Pacemaker] [Problem]Reboot by the error of the clone resource influences the resource of other nodes.

Thu Mar 31 04:07:03 EDT 2011

Hi Vladislav,

Thank you for comment. 

As for us, this problem is taking place in the top of 1.0.10 and 1.0. 

Though possibly there may be this problem from a considerably version in front. 

Let's wait for comment of Andrew.

Best Regards,
Hideo Yamauchi.

--- On Thu, 2011/3/31, Vladislav Bogdanov <bubble at hoster-ok.com> wrote:

> Hi,
> 
> 31.03.2011 04:15, renayama19661014 at ybb.ne.jp wrote:
> [...]
> > Node srv01 (45f985d7-e7c8-4834-b01b-16b99526672b): online
> >         main_rsc        (ocf::pacemaker:Dummy) Started 
> >         prmDummy1:0     (ocf::pacemaker:Dummy) Started 
> >         prmPingd:0      (ocf::pacemaker:ping) Started 
> > Node srv02 (ed7fdcbf-9c17-4f31-8a27-a831a6b39ed5): online
> >         prmDummy1:1     (ocf::pacemaker:Dummy) Started 
> >         main_rsc2       (ocf::pacemaker:Dummy) Started 
> >         prmPingd:1      (ocf::pacemaker:ping) Started 
> > Node srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311): online
> >         prmDummy1:2     (ocf::pacemaker:Dummy) Started 
> >         prmPingd:2      (ocf::pacemaker:ping) Started 
> [...]
> > Node srv01 (45f985d7-e7c8-4834-b01b-16b99526672b): online
> > Node srv02 (ed7fdcbf-9c17-4f31-8a27-a831a6b39ed5): online
> >         prmDummy1:1     (ocf::pacemaker:Dummy) Started     ---------> :1(funny)
> >         prmPingd:0      (ocf::pacemaker:ping) Started      ---------> :0(funny)
> > Node srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311): online
> >         main_rsc        (ocf::pacemaker:Dummy) Started 
> >         prmDummy1:2     (ocf::pacemaker:Dummy) Started     ---------> :2(funny)
> >         prmPingd:1      (ocf::pacemaker:ping) Started      ---------> :1(funny)
> >
> > We think the reboot of pingd to be unnecessary in a srv02 node. 
> > Is there the method how this problem is settled?
> 
> I observe this problem too (with latest 1.1 tip):
> pengine unnecessarily decides to swap anonymous clone instances between
> nodes when it rearranges cluster resources. This leads to all dependent
> resources on that nodes to be stopped and started again.
> 
> In your case it swapped
> srv02:prmPingd:1,srv03:prmPingd:2 <-> srv02:prmPingd:0,srv03:prmPingd:1
> 
> In my case I often see something like this:
> 
> Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
> resource libvirtd:0#011(Started v02-c -> v02-d)
> Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
> resource libvirtd:1#011(Started v02-d -> v02-a)
> Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
> resource libvirtd:2#011(Started v02-a -> v02-b)
> Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
> resource libvirtd:3#011(Started v02-b -> v02-c)
> 
> I contacted Andrew about this directly some time ago (with hb_report),
> but hadn't have power to raise this problem on ML (what is he actually
> asked me to do) :( .
> 
> I suspect this is 1.1-specific, but this is solely a feeling.
> 
> Maybe somebody familiar with mercurial can bisect when this bug was
> introduced?
> 
> Best,
> Vladislav
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>