[Pacemaker] Should monitor operations be stopped after a resource is unmanaged?
Tim Serong
tserong at novell.com
Mon Apr 4 06:27:13 UTC 2011
On 4/4/2011 at 04:29 AM, Ron Kerry <rkerry at sgi.com> wrote:
> On 7/22/64 2:59 PM, Tim Serong wrote:
> > On 4/2/2011 at 09:42 PM, Ron Kerry <rkerry at sgi.com> wrote:
> > > On 7/22/64 2:59 PM, Serge Dubrouski wrote:
> > > > On Fri, Apr 1, 2011 at 2:09 PM, Ron Kerry <rkerry at sgi.com> wrote:
> > > > > On 7/22/64 2:59 PM, Pavel Levshin wrote:
> > > > >>
> > > > >> 01.04.2011 18:36, Ron Kerry:
> > > > >> > Folks -
> > > > >> >
> > > > >> > Consider a running cluster with all resources managed. We want to stop
> > > > >> > and quickly restart a particular resource without impacting other
> > > > >> > resources. The software stack running on the system can deal with this
> > > > >> > sort of temporary outage. We perform the following actions:
> > > > >> > * unmanage the resource
> > > > >> > * stop the resource
> > > > >> > * start the resource
> > > > >> > * manage the resource
> > > > >> >
> > > > >> > The above procedure is sometimes successful. However, we will also
> > > > >> > sometimes get a resource monitor failure after stopping the resource.
> > > > >> > It is clear that the monitor operation was not stopped (at least not
> > > > >> > immediately) by unmanaging the resource.
> > > > >>
> > > > >> Unmanaged resource cannot be started and stopped, but can still be
> > > > >> monitored.
> > > > >
> > > > > So unmanaged really means the resource is still being managed to some
> > > > > degree?
> > > >
> > > > It means that Pacemaker still wants to know its state. What kind of
> > > > problem does it create?
> > > >
> > >
> > > An unmanaged resource whoose monitor is still running will cause a
> monitor
> > > failure when the resource
> > > is stopped. Pacemaker then takes the 'onfail' action defined for the
> monitor
> > > operation. In other
> > > words, the resource is still being managed to some degree. If the monitor
> > > operation was still
> > > running but no action was taken as a result of the monitor operation
> > > outcome, there would be no issue.
> >
> > Try "crm configure property maintenance-mode=true". Admittedly this
> > affects the entire cluster, but it will ensure no starts, stops or
> > monitors...
> >
> > Regards,
> >
> > Tim
>
> Tim -
>
> Thanks, this does work but is rather like using a sledge hammer to do the
> work of a ball peen
> hammer. It unmanages ALL resources and stops all the monitor operations.
Very true.
> How do we go about requesting a change to pacemaker to achieve the desired
> behavior?
File an enhancement request at:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> As I see it there are two options:
>
> 1. fix 'crm resource unmanage <rsc>' to also stop the individual resource
> monitor
>
> -or-
>
> 2. create a 'crm resource maintenance <rsc>' to unmanage and stop the
> individual resource monitor
I'd be going for option 2.
Regards,
Tim
--
Tim Serong <tserong at novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.
More information about the Pacemaker
mailing list