[Pacemaker] About Quorum control at the time of the service stop.(no-quorum-policy=freeze)

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Mon Sep 13 22:17:38 EDT 2010


Hi Andrew,

Thank you for comment.

As a conclusion in case of the freeze setting....

 * At the divided point in time, the resource maintains it.
 * When a node shuts it down, in divided constitution, the resource does migrate.
   -> Maintaining a resource in divided constitution.

Is my understanding right?

Best Regards,
Hideo Yamauchi.

--- Andrew Beekhof <andrew at beekhof.net> wrote:

> On Fri, Sep 10, 2010 at 7:22 AM,  <renayama19661014 at ybb.ne.jp> wrote:
> > Hi,
> >
> > We confirmed movement of no-quorum-policy=freeze in four node constitution.
> >
> > Of course we understand that quorum control does not act in Heartbeat well.
> >
> > We confirmed the service stop of four nodes in the next procedure.
> >
> > Step1) We start four nodes.(3ACT:1STB)
> >
> > Step2) We send cib.xml.
> >
> > ============
> > Last updated: Fri Sep 10 14:16:30 2010
> > Stack: Heartbeat
> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with quorum
> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> > 4 Nodes configured, unknown expected votes
> > 7 Resources configured.
> > ============
> >
> > Online: [ srv01 srv02 srv03 srv04 ]
> >
> > �Resource Group: Group01
> > � � Dummy01 � �(ocf::heartbeat:Dummy): Started srv01
> > � � Dummy01-2 �(ocf::heartbeat:Dummy): Started srv01
> > �Resource Group: Group02
> > � � Dummy02 � �(ocf::heartbeat:Dummy): Started srv02
> > � � Dummy02-2 �(ocf::heartbeat:Dummy): Started srv02
> > �Resource Group: Group03
> > � � Dummy03 � �(ocf::heartbeat:Dummy): Started srv03
> > � � Dummy03-2 �(ocf::heartbeat:Dummy): Started srv03
> > �Resource Group: grpStonith1
> > � � prmStonith1-3 � � �(stonith:external/ssh): Started srv01
> > �Resource Group: grpStonith2
> > � � prmStonith2-3 � � �(stonith:external/ssh): Started srv02
> > �Resource Group: grpStonith3
> > � � prmStonith3-3 � � �(stonith:external/ssh): Started srv03
> > �Resource Group: grpStonith4
> > � � prmStonith4-3 � � �(stonith:external/ssh): Started srv04
> >
> > Step3) We stop the first node after being stable.
> >
> > [root at srv02 ~]# crm_mon -1
> > ============
> > Last updated: Fri Sep 10 14:17:07 2010
> > Stack: Heartbeat
> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with quorum
> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> > 4 Nodes configured, unknown expected votes
> > 7 Resources configured.
> > ============
> >
> > Online: [ srv02 srv03 srv04 ]
> > OFFLINE: [ srv01 ]
> >
> > �Resource Group: Group01
> > � � Dummy01 � �(ocf::heartbeat:Dummy): Started srv04 ---->FO
> > � � Dummy01-2 �(ocf::heartbeat:Dummy): Started srv04 ---->FO
> > �Resource Group: Group02
> > � � Dummy02 � �(ocf::heartbeat:Dummy): Started srv02
> > � � Dummy02-2 �(ocf::heartbeat:Dummy): Started srv02
> > �Resource Group: Group03
> > � � Dummy03 � �(ocf::heartbeat:Dummy): Started srv03
> > � � Dummy03-2 �(ocf::heartbeat:Dummy): Started srv03
> > �Resource Group: grpStonith1
> > � � prmStonith1-3 � � �(stonith:external/ssh): Started srv03
> > �Resource Group: grpStonith2
> > � � prmStonith2-3 � � �(stonith:external/ssh): Started srv02
> > �Resource Group: grpStonith3
> > � � prmStonith3-3 � � �(stonith:external/ssh): Started srv03
> > �Resource Group: grpStonith4
> > � � prmStonith4-3 � � �(stonith:external/ssh): Started srv04
> >
> >
> > Step4) Furthermore, we stop the next node after being stable.
> > �* Because a notice of ccm which does not have Quorum is late, two remaining node nodes
move
> the
> > resource.
> 
> Thats not strictly true.
> The movement is initiated before the second node shuts down, so it is
> considered safe because we still had quorum at the point the decision
> was made.
> 
> >
> > [root at srv03 ~]# crm_mon -1
> > ============
> > Last updated: Fri Sep 10 14:17:59 2010
> > Stack: Heartbeat
> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with quorum
> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> > 4 Nodes configured, unknown expected votes
> > 7 Resources configured.
> > ============
> >
> > Online: [ srv03 srv04 ]
> > OFFLINE: [ srv01 srv02 ]
> >
> > �Resource Group: Group01
> > � � Dummy01 � �(ocf::heartbeat:Dummy): Started srv04
> > � � Dummy01-2 �(ocf::heartbeat:Dummy): Started srv04
> > �Resource Group: Group02
> > � � Dummy02 � �(ocf::heartbeat:Dummy): Started srv04 ---->FO
> > � � Dummy02-2 �(ocf::heartbeat:Dummy): Started srv04 ---->FO
> > �Resource Group: Group03
> > � � Dummy03 � �(ocf::heartbeat:Dummy): Started srv03
> > � � Dummy03-2 �(ocf::heartbeat:Dummy): Started srv03
> > �Resource Group: grpStonith1
> > � � prmStonith1-3 � � �(stonith:external/ssh): Started srv03
> > �Resource Group: grpStonith2
> > � � prmStonith2-3 � � �(stonith:external/ssh): Started srv04
> > �Resource Group: grpStonith3
> > � � prmStonith3-3 � � �(stonith:external/ssh): Started srv03
> > �Resource Group: grpStonith4
> > � � prmStonith4-3 � � �(stonith:external/ssh): Started srv04
> >
> > Step5) We stop one node after being more stable.
> > �* We stopped it since I became have-quorum=0 of cib.
> >
> > [root at srv03 ~]# cibadmin -Q | more
> > <cib epoch="102" num_updates="3" admin_epoch="0" validate-with="pacemaker-1.0"
> crm_feature_set="3.0.1"
> > have-quorum="0" dc-uuid="96faf899-13a6-4550-9d3b-b784f
> > 7241d06">
> >
> > Step6) Some resources moved to the last node.
> >
> > [root at srv04 ~]# crm_mon -1
> > ============
> > Last updated: Fri Sep 10 14:19:43 2010
> > Stack: Heartbeat
> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition WITHOUT quorum
> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> > 4 Nodes configured, unknown expected votes
> > 7 Resources configured.
> > ============
> >
> > Online: [ srv04 ]
> > OFFLINE: [ srv01 srv02 srv03 ]
> >
> > �Resource Group: Group01
> > � � Dummy01 � �(ocf::heartbeat:Dummy): Started srv04
> > � � Dummy01-2 �(ocf::heartbeat:Dummy): Started srv04
> > �Resource Group: Group02
> > � � Dummy02 � �(ocf::heartbeat:Dummy): Started srv04
> > � � Dummy02-2 �(ocf::heartbeat:Dummy): Started srv04
> > �Resource Group: Group03
> > � � Dummy03 � �(ocf::heartbeat:Dummy): Started srv04 ---->Why FO?
> > � � Dummy03-2 �(ocf::heartbeat:Dummy): Started srv04 ---->Why FO?
> 
> In this case, it is because a member of our partition owned the
> resource at the time we initiated the move.
> 
> Unfortunately the scenario here isn't quite testing what you had in mind.
> You only achieve the expected behavior if you remove the second and
> third machines from the cluster _ungracefully_.
> Ie. by fencing them or unplugging them.
> 
> > �Resource Group: grpStonith1
> > � � prmStonith1-3 � � �(stonith:external/ssh): Started srv04
> > �Resource Group: grpStonith2
> > � � prmStonith2-3 � � �(stonith:external/ssh): Started srv04
> > �Resource Group: grpStonith4
> > � � prmStonith4-3 � � �(stonith:external/ssh): Started srv04
> >
> >
> > We thought that the resource that I left in a left node in Step5 did not move last.
> > Because the reason is because it appoints no-quorum-policy=freeze.
> 
> Freeze still allows recovery within a partition.
> Recovery can also occur for graceful shutdowns because the partition
> owned the resource beforehand.
> 
> > However, the starting resource seems to move at the time of no-quorum-policy=freeze when I
> watch a
> > source code.
> >
> > (snip)
> > action_t *
> > custom_action(resource_t *rsc, char *key, const char *task,
> > � � � � � � �node_t *on_node, gboolean optional,
gboolean save_action,
> > � � � � � � �pe_working_set_t *data_set)
> > {
> > � � � �action_t *action = NULL;
> > � � � �GListPtr possible_matches = NULL;
> > � � � �CRM_CHECK(key != NULL, return NULL);
> > � � � �CRM_CHECK(task != NULL, return NULL);
> > (snip)
> > � � � � � � � �} else
if(is_set(data_set->flags, pe_flag_have_quorum) == FALSE
> > � � � � � � � � � � �
�&& data_set->no_quorum_policy == no_quorum_freeze) {
> > � � � � � � � � � � �
�crm_debug_3("Check resource is already active");
> > � � � � � � � � � � �
�if(rsc->fns->active(rsc, TRUE) == FALSE) {
> > � � � � � � � � � � �
� � � � �action->runnable = FALSE;
> > � � � � � � � � � � �
� � � � �crm_debug("%s\t%s (cancelled : quorum freeze)",
> > � � � � � � � � � � �
� � � � � � � � �
�action->node->details->uname,
> > � � � � � � � � � � �
� � � � � � � � �
�action->uuid);
> > � � � � � � � � � � �
�}
> >
> > � � � � � � � �} else {
> 
=== 以下のメッセージは省略されました ===





More information about the Pacemaker mailing list