[Pacemaker] Enable remote monitoring
Gao,Yan
ygao at suse.com
Thu Nov 8 06:24:40 UTC 2012
Hi Andrew,
On 11/08/12 13:09, Andrew Beekhof wrote:
> On Tue, Nov 6, 2012 at 10:30 PM, Gao,Yan <ygao at suse.com> wrote:
>> Hi,
>>
>> Currently, we can manage VMs via the VM agents. But the services running
>> within VMs are not very easy to be monitored. If we could use
>> nagios/icinga probes from the host to the guest, that would allow us to
>> achieve this.
>>
>> Lars, Dejan and I have been discussing on this for some time. There have
>> been quite some thoughts on how to implement it. Now we are inclined to
>> a proposal from Lars. Please let me introduce the idea here, and see
>> what you think about it.
>>
>> First, we could add a resource agent class. The RAs belonging to this
>> class wrap around nagois/icinga probes. They can be configured as
>> special monitor operations for the VMs. The behaviors should be like:
>>
>> 1. The special monitor operations start working after the VMs and the
>> services inside are started.
>>
>> 2. Any failure of the monitor operations is treated as the failure of
>> the VM, which triggers the recovery of the VM.
>>
>> Let me show a example:
>>
>> primitive db-vm ocf:heartbeat:VirtualDomain \
>> params config="db-vm" hypervisor="xen:///" \
>> ip="192.168.1.122" \
>> op monitor nagios:ftp interval="30s" params user="test"
>>
>> The "nagios:ftp" specifies which monitor agent is used to monitor the
>> VM. It's an optional attributes group expressing "class/provider/type"
>> of the monitor agent, which defaults to "ocf:heartbeat:VirtualDomain"
>> for this VM (if so, the monitor would be a normal one like we usually
>> configure). We can add more monitors like "nagios:www" type and so on.
>
> What do you propose the XML should look like?
Should be like:
...
<op id="vm-monitor-30" name="monitor" class="nagios" type="ftp"
interval="30s" ignore-first-failures="true">
<instance_attributes id="vm-monitor-30-params">
<nvpair id="vm-monitor-30-params" name="user" value="test">
</instance_attributes>
</op>
...
>
>> We can specify particular "params" for a monitor. And the "ip" is
>> actually not a useful parameter for the VirtualDomain, we put it there
>> for its monitor operations to inherit, so that we don't have to specify
>> for each monitor respectively.
>
> You plan to add 'ip' to the VirtualDomain metadata?
It should be in the metatdata of nagios:ftp and also other monitor
agents. We'd like parameters inheritance to avoid configuration repetition.
>
>>
>>
>> Other issues:
>> - As we can see, there's some time window between when the VM is
>> started, but prior to the monitored service starting. A solution is
>> adding a "first-failure" flag for the monitor operation, which could
>> allow us to ignore the *first* failures of a monitor until it has
>> returned healthy once, unless the time is out. Ideally, it could be
>> handled in LRM.
>
> What happens if there is never a first success?
> The cluster will never find out.
It'll reach the timeout and return. We should give a reasonable monitor
timeout I think.
>
>>
>> - A limitation is we would have to specify different monitor interval
>> values for the services within a VM. Probably we could fix it in some
>> way finally.
>>
>>
>> Anyway, this's the most straightforward solution we can think of so far
>> (Please correct me if I'm missing anything). It's open for discussion.
>> Any comments and suggestions are welcome and appreciated.
>
> Doesn't look too bad. Some finer points to discuss but I'm sure we
> can reach agreement.
Nice, thanks!
Regards,
Gao,Yan
--
Gao,Yan <ygao at suse.com>
Software Engineer
China Server Team, SUSE.
More information about the Pacemaker
mailing list