[Pacemaker] Enable remote monitoring

David Vossel dvossel at redhat.com
Wed Jan 23 16:36:10 UTC 2013



----- Original Message -----
> From: "Yan Gao" <ygao at suse.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Monday, January 21, 2013 11:28:40 PM
> Subject: Re: [Pacemaker] Enable remote monitoring
> 
> Hi,
> Here's the code for supporting nagios plugins in lrmd:
> 
> https://github.com/gao-yan/pacemaker/commits/nagios
> 
> A new resource class "nagios" is introduced.
> 
> Actions:
> 
> - probe: A resource defined for a resource container is not probed.
> (We
> can also add a condition in pengine to just avoid probing a nagios
> class
> resource.)

Yeah, I think the pengine should know to never probe a nagios script regardless if it is involved in a container or not.

> - start: Invokes the nagios plugin with specified parameters (Maps
> the
> instance attributes to the long options of the nagios plugin). If it
> returns non-OK, re-invokes it after some delay (delay = start_timeout
> /
> 10),  until it returns OK or exceeds the start timeout.

I made a comment about this on the patch.  Shouldn't the cmd->timeout value be updated each time it is re-scheduled to account for time already spent?

> 
> - monitor: Recurring invocation to the nagios plugin with specified
> parameters.
> 
> - stop: Nothing special is done. The recurring monitor is canceled
> anyway.
> 
> - metadata: Reads the corresponding metadata from a xml file in
> NAGIOS_METADATA_DIR.
> 
> (As we know nagios plugins don't support metadata. The current plan
> is
> to generate the corresponding metadata according to the help of the
> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan
> already
> has progress on this. Thank, Dejan!)
> 
> 
> For nagios plugins, the exit code are:
> 
> STATE_OK        = 0,
> STATE_WARNING   = 1,
> STATE_CRITICAL  = 2,
> STATE_UNKNOWN   = 3,
> STATE_DEPENDENT = 4,
> 
> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others should
> all
> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no
> code
> to express "NOT_RUNNING" in nagios plugins. I think it should be
>  fine,
> since there's no probe.
> 
> Any suggestions are appreciated!

This mostly looks like what I expected.  I'm letting the whole re-scheduling of the start operation roll around in my head a bit.  It almost seems like that functionality belongs in the service library...  retry executing this action until either the timeout is hit or some target return code is encountered.  Any thoughts on that?

-- Vossel

> Thanks,
>   Gao,Yan
> 
> --
> Gao,Yan <ygao at suse.com>
> Software Engineer
> China Server Team, SUSE.
> 
>   * English - detected
>   * English
>   * Chinese (Simplified)
> 
>   * English
>   * Chinese (Simplified)
> 
>  <javascript:void(0);> <#>
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Pacemaker mailing list