[Pacemaker] [Problem]A monitor of Master stops when crm command repeat the movement of the resource.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Mon Nov 14 19:08:36 EST 2011


Hi Dejan,

> Patch applied. Many thanks!
All right.

Thanks!
Hideo Yamauchi.

--- On Tue, 2011/11/15, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:

> Hi Hideo-san,
> 
> On Mon, Nov 14, 2011 at 09:50:21AM +0900, renayama19661014 at ybb.ne.jp wrote:
> > Hi Dejan,
> > 
> > I attach a right patch for Reusable-Cluster-Components-glue--3b800f73ba59.
> 
> Patch applied. Many thanks!
> 
> Dejan
> 
> > Best Regards,
> > Hideo Yamauchi.
> > 
> > 
> > --- On Sat, 2011/11/12, renayama19661014 at ybb.ne.jp <renayama19661014 at ybb.ne.jp> wrote:
> > 
> > > Hi Dejan,
> > > 
> > > Thank you for comment.
> > > 
> > > This correction seems to be necessary for the cancellation of the monitor of the Master/Slave resource.
> > > This correction creates right states of Master/Slave.
> > > I tested it.
> > > And the correction solved a problem.
> > > 
> > > Best Regards,
> > > Hideo Yamauchi.
> > > 
> > > --- On Sat, 2011/11/12, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> > > 
> > > > Hi Hideo-san,
> > > > 
> > > > On Fri, Nov 11, 2011 at 12:03:28PM +0900, renayama19661014 at ybb.ne.jp wrote:
> > > > > Hi All,
> > > > > 
> > > > > In the first place lrmd does not return the result of the cancellation request that made pending.
> > > > > 
> > > > > However, a result of the cancellation is necessary for crmd.
> > > > > 
> > > > > Will not the following correction be necessary?
> > > > > (The correction does not consider errors properly; is temporary.)
> > > > > 
> > > > > example : 
> > > > > (snip)
> > > > >         if (!op->is_cancelled) {
> > > > >                 if( !record_op_completion(rsc,op) ) { /*record the outcome of the op */
> > > > >                         if (op->interval) /* copy op to the repeat list */
> > > > >                                 to_repeatlist(rsc,op);
> > > > >                 }
> > > > >         } else {
> > > > > ha_msg_mod_int(op->msg,F_LRM_OPSTATUS,(int)LRM_OP_CANCELLED); --->append
> > > > > op_status = LRM_OP_CANCELLED; --->append
> > > > 
> > > > In this case crmd should get HA_RSC_BUSY as a return code for the
> > > > operation cancel request. IIRC, that means that it should retry
> > > > later (though I may be wrong). If it doesn't retry, then we
> > > > should indeed add what you suggest.
> > > > 
> > > > Did you already try this patch? Does it help?
> > > > 
> > > > Cheers,
> > > > 
> > > > Dejan
> > > > 
> > > > >                 remove_op_history(op);
> > > > >         }
> > > > > (snip)
> > > > > 
> > > > > 
> > > > > 
> > > > > Best Regards,
> > > > > Hideo Yamauchi.
> > > > > 
> > > > > 
> > > > > --- On Wed, 2011/11/9, renayama19661014 at ybb.ne.jp <renayama19661014 at ybb.ne.jp> wrote:
> > > > > 
> > > > > > Hi All,
> > > > > > 
> > > > > > This phenomenon sometimes produced even 1.0.10.
> > > > > > 
> > > > > > The cause seems to be to have performed pending of the cancellation of the monitor lrmd.
> > > > > > Therefore, the active information of the monitor is left in the inside of cib.
> > > > > > 
> > > > > > Best Regards,
> > > > > > Hideo Yamauchi.
> > > > > > 
> > > > > > --- On Tue, 2011/11/8, renayama19661014 at ybb.ne.jp <renayama19661014 at ybb.ne.jp> wrote:
> > > > > > 
> > > > > > > Hi All,
> > > > > > > 
> > > > > > > 
> > > > > > > We tested the movement of the resource in Master/Slave.
> > > > > > > 
> > > > > > > ============
> > > > > > > Last updated: Tue Nov  8 14:12:23 2011
> > > > > > > Stack: Heartbeat
> > > > > > > Current DC: bl460g1b (1b34eec8-1d62-488b-a7fb-8e4b38f95ec3) - partition with quorum
> > > > > > > Version: 1.0.11-9af47ddebcad19e35a61b2a20301dc038018e8e8
> > > > > > > 2 Nodes configured, unknown expected votes
> > > > > > > 3 Resources configured.
> > > > > > > ============
> > > > > > > 
> > > > > > > Online: [ bl460g1a bl460g1b ]
> > > > > > > 
> > > > > > >  Resource Group: master-group
> > > > > > >      vip-master (ocf::heartbeat:Dummy): Started bl460g1a
> > > > > > >      vip-rep    (ocf::heartbeat:Dummy): Started bl460g1a
> > > > > > >  Master/Slave Set: msPostgresql
> > > > > > >      Masters: [ bl460g1a ]
> > > > > > >      Slaves: [ bl460g1b ]
> > > > > > >  Clone Set: clnPingCheck
> > > > > > >      Started: [ bl460g1a bl460g1b ]
> > > > > > > 
> > > > > > > Migration summary:
> > > > > > > * Node bl460g1b: 
> > > > > > > * Node bl460g1a: 
> > > > > > > 
> > > > > > > 
> > > > > > > I change monitor handling of Stateful RA.
> > > > > > > (snip)
> > > > > > > stateful_monitor() {
> > > > > > > echo "Stateful monitor" >> /tmp/test.log
> > > > > > >     stateful_check_state "master"
> > > > > > > (snip)
> > > > > > > 
> > > > > > > I repeat movement in the following script.
> > > > > > > 
> > > > > > > #!/bin/sh
> > > > > > > i=1
> > > > > > > while [ 1 ]; do
> > > > > > >   echo "##############################" >> /tmp/test.log
> > > > > > >   echo "move $i"
> > > > > > >   crm resource move vip-rep
> > > > > > >   echo "sleep"
> > > > > > >   sleep 60
> > > > > > >   crm resource unmove vip-rep
> > > > > > >   i=`expr $i + 1`
> > > > > > > done;
> > > > > > > 
> > > > > > > 
> > > > > > > The phenomenon that a monitor of Master is not carried out occurs when I repeat movement in a script for a while.
> > > > > > > (A problem reappears at considerable frequency when it continues carrying away a script.)
> > > > > > > 
> > > > > > > 
> > > > > > > This problem seems to happen in both 1.0 most recent versions and 1.0.11 version.
> > > > > > > 
> > > > > > >  * Pacemaker-1-0-9af47ddebcad
> > > > > > >  * Pacemaker-1-0-6e010d6b0d49 
> > > > > > > 
> > > > > > > 
> > > > > > > A stop of the monitors is a problem very much.
> > > > > > > I request improvement.
> > > > > > > 
> > > > > > > I register these contents and hb_report with Bugzilla.
> > > > > > > 
> > > > > > >  * http://bugs.clusterlabs.org/show_bug.cgi?id=5010
> > > > > > > 
> > > > > > > 
> > > > > > > Best Regards,
> > > > > > > Hideo Yamauchi.
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > > > > > 
> > > > > > > Project Home: http://www.clusterlabs.org
> > > > > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > > > > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > > > > > > 
> > > > > > 
> > > > > > _______________________________________________
> > > > > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > > > > 
> > > > > > Project Home: http://www.clusterlabs.org
> > > > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > > > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > > > 
> > > > > Project Home: http://www.clusterlabs.org
> > > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > > > 
> > > 
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > 
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > >
> 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> 




More information about the Pacemaker mailing list