[Pacemaker] [Question]About the recovery procedure from the state that a node was divided.

Mon Nov 15 08:09:55 UTC 2010

Hi Andrew,

Thank you for comment.

> > �Step3) Make "/var/lib/heartbeat/crm/" clean.
> > � � � �Make it clean in all nodes
> > �Step4) Start all four nodes.
> > �Step5) Send cib information to a cluster.
> > �Step6) A cluster is rebuilt.
> >
> >
> > We do not want to take the second method.
> > Because, all resources stop when we take second method.
> >
> > Is not there a problem in the first method that we took?
> 
> Step 3 should not be necessary, but otherwise there is nothing wrong
> with the first method.
> That usage is essentially what it was designed for.

Really?

If there is not a procedure of Step3, I think that the bug that I reported before is easy to occur.
 * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2508

I think that this bug influences that a procedure of step3 is necessary.

> Hope that answers your question.

Thanks.
If a procedure of Step3 is not necessary, I think that it is splendid.

I examine a problem a little more and report it.

Best Regards,
Hideo Yamauchi.

--- Andrew Beekhof <andrew at beekhof.net> wrote:

> On Thu, Nov 4, 2010 at 2:44 AM,  <renayama19661014 at ybb.ne.jp> wrote:
> > Hi All,
> >
> > We tested it about the recovery procedure from the state that a node was divided.
> > (As for four nodes, three nodes are active, and one node is constitution of the standby.)
> >
> > It is the restoration from a state divided by two nodes that we set in
> no-quorum-policy="freeze".
> >
> > The resource keeps a state as is after it was divided in the case of freeze setting.
> > (We tested it using special RA to evade that recognition of the division of the node of ccm
> was late
> > in Heartbeat.)
> >
> >
> > We confirmed some patterns to recovery.
> > And we thought that the next method was desirable.
> >
> > * The first method. (By this method, all resources do not stop.)
> > �Step1) Stop all the divided nodes of the one side.
> > �Step2) Break off the problem that a node divided.(For example, change a network card.)
> > �Step3) Make "/var/lib/heartbeat/crm/" clean.
> > � � � �Make it clean in the node that stopped.
> > �Step4) Start two nodes that stopped.
> > �Step5) A cluster is rebuilt.
> >
> > * The second method. (But, all resources stop when we take this method)
> > �Step1) Stop all four nodes.
> > �Step2) Break off the problem that a node divided.(For example, change a network card.)
> > �Step3) Make "/var/lib/heartbeat/crm/" clean.
> > � � � �Make it clean in all nodes
> > �Step4) Start all four nodes.
> > �Step5) Send cib information to a cluster.
> > �Step6) A cluster is rebuilt.
> >
> >
> > We do not want to take the second method.
> > Because, all resources stop when we take second method.
> >
> > Is not there a problem in the first method that we took?
> 
> Step 3 should not be necessary, but otherwise there is nothing wrong
> with the first method.
> That usage is essentially what it was designed for.
> 
> Hope that answers your question.
> 
> >
> > Is there a method to recommend by a recovery method of the division from freeze setting as
> community?
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>