[Pacemaker] Clone resource dependency issue - undesired restart of dependent resources
Ron Kerry
rkerry at sgi.com
Wed Mar 2 21:27:05 CET 2011
I have narrowed this down to an issue that I feel is really a bug in the way pacemaker is dealing
with constraints made between resource groups as opposed to resource primitives. The version of
pacemaker involved here is:
libpacemaker3-1.1.2-0.7.1
pacemaker-1.1.2-0.7.1
A configuration which involves colocation/order constraints made between a clone and a simple
primitive exhibits proper failover behavior.
Online: [ elvis queen ]
Clone Set: A-clone [A]
Started: [ elvis queen ]
B-1 (ocf::rgk:typeB): Started elvis
B-2 (ocf::rgk:typeB): Started queen
Clone Set: stonith-l2network-set [stonith-l2network]
Started: [ elvis queen ]
<constraints>
<rsc_colocation id="A-with-B-1" rsc="B-1" score="INFINITY" with-rsc="A-clone"/>
<rsc_colocation id="A-with-B-2" rsc="B-2" score="INFINITY" with-rsc="A-clone"/>
<rsc_order first="A-clone" id="A-before-B-1" symmetrical="true" then="B-1"/>
<rsc_order first="A-clone" id="A-before-B-2" symmetrical="true" then="B-2"/>
</constraints>
This is from after queen come back into the cluster after beign reset.
Mar 1 16:07:03 elvis pengine: [4218]: info: determine_online_status: Node elvis is online
Mar 1 16:07:03 elvis pengine: [4218]: info: determine_online_status: Node queen is online
Mar 1 16:07:03 elvis pengine: [4218]: notice: clone_print: Clone Set: A-clone [A]
Mar 1 16:07:03 elvis pengine: [4218]: notice: short_print: Started: [ elvis ]
Mar 1 16:07:03 elvis pengine: [4218]: notice: short_print: Stopped: [ A:1 ]
Mar 1 16:07:03 elvis pengine: [4218]: notice: native_print: B-1 (ocf::rgk:typeB):
Started elvis
Mar 1 16:07:03 elvis pengine: [4218]: notice: native_print: B-2 (ocf::rgk:typeB):
Started elvis
Mar 1 16:07:03 elvis pengine: [4218]: notice: clone_print: Clone Set: stonith-l2network-set
[stonith-l2network]
Mar 1 16:07:03 elvis pengine: [4218]: notice: short_print: Started: [ elvis ]
Mar 1 16:07:03 elvis pengine: [4218]: notice: short_print: Stopped: [ stonith-l2network:1 ]
Mar 1 16:07:03 elvis pengine: [4218]: notice: LogActions: Leave resource A:0 (Started elvis)
Mar 1 16:07:03 elvis pengine: [4218]: notice: LogActions: Start A:1 (queen)
Mar 1 16:07:03 elvis pengine: [4218]: notice: LogActions: Leave resource B-1 (Started elvis)
Mar 1 16:07:03 elvis pengine: [4218]: notice: LogActions: Leave resource B-2 (Started elvis)
Mar 1 16:07:03 elvis pengine: [4218]: notice: LogActions: Leave resource stonith-l2network:0
(Started elvis)
Mar 1 16:07:03 elvis pengine: [4218]: notice: LogActions: Start stonith-l2network:1 (queen)
Note that the dependent B-1 and B-2 resources are left running where they were. This is proper and
expected failover behavior given they have resource-stickiness set.
While the same configuration replacing the simple primitives with a group of two primitives exhibits
incorrect failover behavior.
Online: [ elvis queen ]
Clone Set: AZ-clone [AZ-group]
Started: [ elvis queen ]
Resource Group: BC-group-1
B-1 (ocf::rgk:typeB): Started elvis
C-1 (ocf::rgk:typeC): Started elvis
Clone Set: stonith-l2network-set [stonith-l2network]
Started: [ elvis queen ]
Resource Group: BC-group-2
B-2 (ocf::rgk:typeB): Started queen
C-2 (ocf::rgk:typeC): Started queen
<constraints>
<rsc_colocation id="AZ-with-BC-group-1" rsc="BC-group-1" score="INFINITY" with-rsc="AZ-clone"/>
<rsc_colocation id="AZ-with-BC-group-2" rsc="BC-group-2" score="INFINITY" with-rsc="AZ-clone"/>
<rsc_order first="AZ-clone" id="AZ-before-BC-group-1" symmetrical="true" then="BC-group-1"/>
<rsc_order first="AZ-clone" id="AZ-before-BC-group-2" symmetrical="true" then="BC-group-2"/>
</constraints>
Mar 2 12:44:43 elvis pengine: [4218]: info: determine_online_status: Node elvis is online
Mar 2 12:44:43 elvis pengine: [4218]: info: determine_online_status: Node queen is online
Mar 2 12:44:43 elvis pengine: [4218]: notice: clone_print: Clone Set: AZ-clone [AZ-group]
Mar 2 12:44:43 elvis pengine: [4218]: notice: short_print: Started: [ elvis ]
Mar 2 12:44:43 elvis pengine: [4218]: notice: short_print: Stopped: [ AZ-group:1 ]
Mar 2 12:44:43 elvis pengine: [4218]: notice: group_print: Resource Group: BC-group-1
Mar 2 12:44:43 elvis pengine: [4218]: notice: native_print: B-1 (ocf::rgk:typeB):
Started elvis
Mar 2 12:44:43 elvis pengine: [4218]: notice: native_print: C-1 (ocf::rgk:typeC):
Started elvis
Mar 2 12:44:43 elvis pengine: [4218]: notice: clone_print: Clone Set: stonith-l2network-set
[stonith-l2network]
Mar 2 12:44:43 elvis pengine: [4218]: notice: short_print: Started: [ elvis ]
Mar 2 12:44:43 elvis pengine: [4218]: notice: short_print: Stopped: [ stonith-l2network:1 ]
Mar 2 12:44:43 elvis pengine: [4218]: notice: group_print: Resource Group: BC-group-2
Mar 2 12:44:43 elvis pengine: [4218]: notice: native_print: B-2 (ocf::rgk:typeB):
Started elvis
Mar 2 12:44:43 elvis pengine: [4218]: notice: native_print: C-2 (ocf::rgk:typeC):
Started elvis
Mar 2 12:44:43 elvis pengine: [4218]: notice: LogActions: Leave resource A:0 (Started elvis)
Mar 2 12:44:43 elvis pengine: [4218]: notice: LogActions: Leave resource Z:0 (Started elvis)
Mar 2 12:44:43 elvis pengine: [4218]: notice: LogActions: Start A:1 (queen)
Mar 2 12:44:43 elvis pengine: [4218]: notice: LogActions: Start Z:1 (queen)
Mar 2 12:44:43 elvis pengine: [4218]: notice: LogActions: Restart resource B-1 (Started elvis)
Mar 2 12:44:43 elvis pengine: [4218]: notice: LogActions: Restart resource C-1 (Started elvis)
Mar 2 12:44:43 elvis pengine: [4218]: notice: LogActions: Leave resource stonith-l2network:0
(Started elvis)
Mar 2 12:44:43 elvis pengine: [4218]: notice: LogActions: Start stonith-l2network:1 (queen)
Mar 2 12:44:43 elvis pengine: [4218]: notice: LogActions: Restart resource B-2 (Started elvis)
Mar 2 12:44:43 elvis pengine: [4218]: notice: LogActions: Restart resource C-2 (Started elvis)
Note that the dependent B-1/C-1 and B-2/C-2 resources are restarted. This is incorrect failover
behavior given they have resource-stickiness set.
This appears to me to be a clear bug. Pacemaker should not be handling groups any differently than
it does primitives!
On 3/1/2011 5:48 PM, Ron Kerry wrote:
> On 3/1/2011 2:39 PM, Ron Kerry wrote:
> > On 2/28/2011 2:33 PM, Ron Kerry wrote:
> >> Folks -
> >>
> >> I have a configuration issue that I am unsure how to resolve. Consider the following set of
> >> resources.
> >>
> >> clone rsc1-clone rsc1 \
> >> meta clone-max="2" target-role="Started"
> >> primitive rsc1 ...
> >> primitive rsc2 ... meta resource-stickiness="1"
> >> primitive rsc3 ... meta resource-stickiness="1"
> >>
> >> Plus the following constraints
> >>
> >> colocation rsc2-with-clone inf: rsc2 rsc1-clone
> >> colocation rsc3-with-clone inf: rsc3 rsc1-clone
> >> order clone-before-rsc2 : rsc1-clone rsc2
> >> order clone-before-rsc3 : rsc1-clone rsc3
> >>
> >>
> >> I am getting the following behavior that is undesirable.
> >>
> >> During normal operation, a copy of the rsc1 resource is running on my two systems with rs2 and rsc3
> >> typically running split between the two systems. The rsc2 & rsc3 resources are operationally
> >> dependent on a copy of rsc1 being up and running first.
> >>
> >> SystemA SystemB
> >> ======= =======
> >> rsc1 rsc1
> >> rsc2 rsc3
> >>
> >> If SystemB goes down, then rsc3 moves over to SystemA as expected
> >>
> >> SystemA SystemB
> >> ======= =======
> >> rsc1 X X
> >> rsc2 X
> >> rsc3 X X
> >>
> >> When SystemB comes back into the cluster, crmd starts the rsc1 clone on SystemB but then also
> >> restarts both rsc2 & rsc3. This means both are stopped and then both started again. This is not what
> >> we want. We want these resources to remain running on SystemA until one of them is moved manually by
> >> an administrator to re-balance them across the systems.
> >>
> >> How do we configure these resources/constraints to achieve that behavior? We are already using
> >> resource-stickiness, but that is meaningless if crmd is going to be doing a restart of these
> >> resources.
> >>
> >
> > Using advisory (score="0") order constraints seems to acheive the correct behavior. I have not done
> > extensive testing yet to see if other failover behaviors are broken with this approach, but initial
> > basic testing looks good. It is always nice to answer one's own questions :-)
> >
> > colocation rsc2-with-clone inf: rsc2 rsc1-clone
> > colocation rsc3-with-clone inf: rsc3 rsc1-clone
> > order clone-before-rsc2 0: rsc1-clone rsc2
> > order clone-before-rsc3 0: rsc1-clone rsc3
> >
> > Does anyone know of any specific problems with this approach??
> >
> >
>
> I set up a greatly simplified generic resource configuration:
>
> Online: [ elvis queen ]
> Clone Set: A-clone [A]
> Started: [ elvis queen ]
> B-1 (ocf::rgk:typeB): Started elvis
> B-2 (ocf::rgk:typeB): Started queen
> Clone Set: stonith-l2network-set [stonith-l2network]
> Started: [ elvis queen ]
>
> The A and B resources are just shell scripts in infinite while loop where the contents of the loop
> is a sleep 5 command so they run forever but do not consume machine resources.
>
> If I kill the A-clone running on queen, it just gets restarted and nothing at all happens to B-2 (it
> stays on queen and never knows any different). This is not optimal behavior for our purposes.
>
> However on the good side, if the A-clone cannot (re)start on queen, then B-2 does fail over to elvis
> as we expect.
>
> Does anybody have any ideas about how to get the proper behavior in all cases?
>
--
Ron Kerry rkerry at sgi.com
Global Product Support - SGI Federal
More information about the Pacemaker
mailing list