[Pacemaker] Clone resource dependency issue - undesired restart of dependent resources

Wed Mar 2 15:27:05 EST 2011

I have narrowed this down to an issue that I feel is really a bug in the way pacemaker is dealing 
with constraints made between resource groups as opposed to resource primitives. The version of 
pacemaker involved here is:
   libpacemaker3-1.1.2-0.7.1
   pacemaker-1.1.2-0.7.1

A configuration which involves colocation/order constraints made between a clone and a simple 
primitive exhibits proper failover behavior.

   Online: [ elvis queen ]
    Clone Set: A-clone [A]
        Started: [ elvis queen ]
    B-1    (ocf::rgk:typeB):       Started elvis
    B-2    (ocf::rgk:typeB):       Started queen
    Clone Set: stonith-l2network-set [stonith-l2network]
        Started: [ elvis queen ]

     <constraints>
       <rsc_colocation id="A-with-B-1" rsc="B-1" score="INFINITY" with-rsc="A-clone"/>
       <rsc_colocation id="A-with-B-2" rsc="B-2" score="INFINITY" with-rsc="A-clone"/>
       <rsc_order first="A-clone" id="A-before-B-1" symmetrical="true" then="B-1"/>
       <rsc_order first="A-clone" id="A-before-B-2" symmetrical="true" then="B-2"/>
     </constraints>

This is from after queen come back into the cluster after beign reset.

Mar  1 16:07:03 elvis pengine: [4218]: info: determine_online_status: Node elvis is online
Mar  1 16:07:03 elvis pengine: [4218]: info: determine_online_status: Node queen is online
Mar  1 16:07:03 elvis pengine: [4218]: notice: clone_print:  Clone Set: A-clone [A]
Mar  1 16:07:03 elvis pengine: [4218]: notice: short_print:      Started: [ elvis ]
Mar  1 16:07:03 elvis pengine: [4218]: notice: short_print:      Stopped: [ A:1 ]
Mar  1 16:07:03 elvis pengine: [4218]: notice: native_print: B-1        (ocf::rgk:typeB): 
Started elvis
Mar  1 16:07:03 elvis pengine: [4218]: notice: native_print: B-2        (ocf::rgk:typeB): 
Started elvis
Mar  1 16:07:03 elvis pengine: [4218]: notice: clone_print:  Clone Set: stonith-l2network-set 
[stonith-l2network]
Mar  1 16:07:03 elvis pengine: [4218]: notice: short_print:      Started: [ elvis ]
Mar  1 16:07:03 elvis pengine: [4218]: notice: short_print:      Stopped: [ stonith-l2network:1 ]
Mar  1 16:07:03 elvis pengine: [4218]: notice: LogActions: Leave resource A:0   (Started elvis)
Mar  1 16:07:03 elvis pengine: [4218]: notice: LogActions: Start A:1    (queen)
Mar  1 16:07:03 elvis pengine: [4218]: notice: LogActions: Leave resource B-1   (Started elvis)
Mar  1 16:07:03 elvis pengine: [4218]: notice: LogActions: Leave resource B-2   (Started elvis)
Mar  1 16:07:03 elvis pengine: [4218]: notice: LogActions: Leave resource stonith-l2network:0 
(Started elvis)
Mar  1 16:07:03 elvis pengine: [4218]: notice: LogActions: Start stonith-l2network:1    (queen)

Note that the dependent B-1 and B-2 resources are left running where they were. This is proper and 
expected failover behavior given they have resource-stickiness set.

While the same configuration replacing the simple primitives with a group of two primitives exhibits 
incorrect failover behavior.

   Online: [ elvis queen ]
    Clone Set: AZ-clone [AZ-group]
        Started: [ elvis queen ]
    Resource Group: BC-group-1
        B-1	(ocf::rgk:typeB):	Started elvis
        C-1	(ocf::rgk:typeC):	Started elvis
    Clone Set: stonith-l2network-set [stonith-l2network]
        Started: [ elvis queen ]
    Resource Group: BC-group-2
        B-2	(ocf::rgk:typeB):	Started queen
        C-2	(ocf::rgk:typeC):	Started queen

     <constraints>
       <rsc_colocation id="AZ-with-BC-group-1" rsc="BC-group-1" score="INFINITY" with-rsc="AZ-clone"/>
       <rsc_colocation id="AZ-with-BC-group-2" rsc="BC-group-2" score="INFINITY" with-rsc="AZ-clone"/>
       <rsc_order first="AZ-clone" id="AZ-before-BC-group-1" symmetrical="true" then="BC-group-1"/>
       <rsc_order first="AZ-clone" id="AZ-before-BC-group-2" symmetrical="true" then="BC-group-2"/>
     </constraints>

Mar  2 12:44:43 elvis pengine: [4218]: info: determine_online_status: Node elvis is online
Mar  2 12:44:43 elvis pengine: [4218]: info: determine_online_status: Node queen is online
Mar  2 12:44:43 elvis pengine: [4218]: notice: clone_print:  Clone Set: AZ-clone [AZ-group]
Mar  2 12:44:43 elvis pengine: [4218]: notice: short_print:      Started: [ elvis ]
Mar  2 12:44:43 elvis pengine: [4218]: notice: short_print:      Stopped: [ AZ-group:1 ]
Mar  2 12:44:43 elvis pengine: [4218]: notice: group_print:  Resource Group: BC-group-1
Mar  2 12:44:43 elvis pengine: [4218]: notice: native_print:      B-1   (ocf::rgk:typeB): 
Started elvis
Mar  2 12:44:43 elvis pengine: [4218]: notice: native_print:      C-1   (ocf::rgk:typeC): 
Started elvis
Mar  2 12:44:43 elvis pengine: [4218]: notice: clone_print:  Clone Set: stonith-l2network-set 
[stonith-l2network]
Mar  2 12:44:43 elvis pengine: [4218]: notice: short_print:      Started: [ elvis ]
Mar  2 12:44:43 elvis pengine: [4218]: notice: short_print:      Stopped: [ stonith-l2network:1 ]
Mar  2 12:44:43 elvis pengine: [4218]: notice: group_print:  Resource Group: BC-group-2
Mar  2 12:44:43 elvis pengine: [4218]: notice: native_print:      B-2   (ocf::rgk:typeB): 
Started elvis
Mar  2 12:44:43 elvis pengine: [4218]: notice: native_print:      C-2   (ocf::rgk:typeC): 
Started elvis
Mar  2 12:44:43 elvis pengine: [4218]: notice: LogActions: Leave resource A:0   (Started elvis)
Mar  2 12:44:43 elvis pengine: [4218]: notice: LogActions: Leave resource Z:0   (Started elvis)
Mar  2 12:44:43 elvis pengine: [4218]: notice: LogActions: Start A:1    (queen)
Mar  2 12:44:43 elvis pengine: [4218]: notice: LogActions: Start Z:1    (queen)
Mar  2 12:44:43 elvis pengine: [4218]: notice: LogActions: Restart resource B-1 (Started elvis)
Mar  2 12:44:43 elvis pengine: [4218]: notice: LogActions: Restart resource C-1 (Started elvis)
Mar  2 12:44:43 elvis pengine: [4218]: notice: LogActions: Leave resource stonith-l2network:0 
(Started elvis)
Mar  2 12:44:43 elvis pengine: [4218]: notice: LogActions: Start stonith-l2network:1    (queen)
Mar  2 12:44:43 elvis pengine: [4218]: notice: LogActions: Restart resource B-2 (Started elvis)
Mar  2 12:44:43 elvis pengine: [4218]: notice: LogActions: Restart resource C-2 (Started elvis)

Note that the dependent B-1/C-1 and B-2/C-2 resources are restarted. This is incorrect failover 
behavior given they have resource-stickiness set.

This appears to me to be a clear bug. Pacemaker should not be handling groups any differently than 
it does primitives!

On 3/1/2011 5:48 PM, Ron Kerry wrote:
> On 3/1/2011 2:39 PM, Ron Kerry wrote:
>  > On 2/28/2011 2:33 PM, Ron Kerry wrote:
>  >> Folks -
>  >>
>  >> I have a configuration issue that I am unsure how to resolve. Consider the following set of
>  >> resources.
>  >>
>  >> clone rsc1-clone rsc1 \
>  >> meta clone-max="2" target-role="Started"
>  >> primitive rsc1 ...
>  >> primitive rsc2 ... meta resource-stickiness="1"
>  >> primitive rsc3 ... meta resource-stickiness="1"
>  >>
>  >> Plus the following constraints
>  >>
>  >> colocation rsc2-with-clone inf: rsc2 rsc1-clone
>  >> colocation rsc3-with-clone inf: rsc3 rsc1-clone
>  >> order clone-before-rsc2 : rsc1-clone rsc2
>  >> order clone-before-rsc3 : rsc1-clone rsc3
>  >>
>  >>
>  >> I am getting the following behavior that is undesirable.
>  >>
>  >> During normal operation, a copy of the rsc1 resource is running on my two systems with rs2 and rsc3
>  >> typically running split between the two systems. The rsc2 & rsc3 resources are operationally
>  >> dependent on a copy of rsc1 being up and running first.
>  >>
>  >> SystemA SystemB
>  >> ======= =======
>  >> rsc1 rsc1
>  >> rsc2 rsc3
>  >>
>  >> If SystemB goes down, then rsc3 moves over to SystemA as expected
>  >>
>  >> SystemA SystemB
>  >> ======= =======
>  >> rsc1 X X
>  >> rsc2 X
>  >> rsc3 X X
>  >>
>  >> When SystemB comes back into the cluster, crmd starts the rsc1 clone on SystemB but then also
>  >> restarts both rsc2 & rsc3. This means both are stopped and then both started again. This is not what
>  >> we want. We want these resources to remain running on SystemA until one of them is moved manually by
>  >> an administrator to re-balance them across the systems.
>  >>
>  >> How do we configure these resources/constraints to achieve that behavior? We are already using
>  >> resource-stickiness, but that is meaningless if crmd is going to be doing a restart of these
>  >> resources.
>  >>
>  >
>  > Using advisory (score="0") order constraints seems to acheive the correct behavior. I have not done
>  > extensive testing yet to see if other failover behaviors are broken with this approach, but initial
>  > basic testing looks good. It is always nice to answer one's own questions :-)
>  >
>  > colocation rsc2-with-clone inf: rsc2 rsc1-clone
>  > colocation rsc3-with-clone inf: rsc3 rsc1-clone
>  > order clone-before-rsc2 0: rsc1-clone rsc2
>  > order clone-before-rsc3 0: rsc1-clone rsc3
>  >
>  > Does anyone know of any specific problems with this approach??
>  >
>  >
>
> I set up a greatly simplified generic resource configuration:
>
> Online: [ elvis queen ]
> Clone Set: A-clone [A]
> Started: [ elvis queen ]
> B-1 (ocf::rgk:typeB): Started elvis
> B-2 (ocf::rgk:typeB): Started queen
> Clone Set: stonith-l2network-set [stonith-l2network]
> Started: [ elvis queen ]
>
> The A and B resources are just shell scripts in infinite while loop where the contents of the loop
> is a sleep 5 command so they run forever but do not consume machine resources.
>
> If I kill the A-clone running on queen, it just gets restarted and nothing at all happens to B-2 (it
> stays on queen and never knows any different). This is not optimal behavior for our purposes.
>
> However on the good side, if the A-clone cannot (re)start on queen, then B-2 does fail over to elvis
> as we expect.
>
> Does anybody have any ideas about how to get the proper behavior in all cases?
>

-- 

Ron Kerry         rkerry at sgi.com
Global Product Support - SGI Federal