[Pacemaker] How to send email-notification on failure of resource in cluster frame work
Vadym Chepkov
vchepkov at gmail.com
Wed Mar 30 11:37:44 UTC 2011
On Mar 29, 2011, at 11:34 PM, Michael Schwartzkopff wrote:
>> On Mar 29, 2011 6:12 AM, "Michael Schwartzkopff" <misch at clusterbau.com>
>>
>> wrote:
>>>> On Tue, Mar 29, 2011 at 3:29 AM, Vadym Chepkov <vchepkov at gmail.com>
>>
>> wrote:
>>>>> On Mar 24, 2011, at 12:46 AM, Rakesh K wrote:
>>>>>> Hi ALL
>>>>>> Is there any way to send Email notifications when a resource is
>>
>> failure
>>
>>>>>> in the cluster frame work.
>>>>>>
>>>>>> while i was going through the Pacemaker-explained document provided
>>
>> in
>>
>>>>>> the website www.clusterlabs.org
>>>>>>
>>>>>> There was no content in the chapter 7 --> which is sending email
>>>>>> notification events.
>>>>>>
>>>>>> can anybody help me regarding this.
>>>>>>
>>>>>> for know i am approaching the crm_mon --daemonize --as-html <path ot
>>>>>> fil> to maintain the status of HA in html file.
>>>>>>
>>>>>> Is there any other approach for sending email notification.
>>>>>
>>>>> Last time I checked, crm_mon is not well suited for this purpose.
>>>>>
>>>>> crm_mon has the following option
>>>>>
>>>>> -T, --mail-to=value
>>>>>
>>>>> Send Mail alerts to this user. See also
>>>>>
>>>>> --mail-from, --mail-host, --mail-prefix
>>>>>
>>>>> But you will end-up with obscene amount of e-mails, I was blocked
>>>>> from gmail when I tried to use it once :) For one resource failure
>>>>> you will get 4 e-mails: monitor,stop,start,monitor. Now imagine if
>>>>> it was a
>>
>> most
>>
>>>>> significant member of a group or worse, node failure...
>>>>>
>>>>> nagios would be better suited for this purpose, but, unfortunately,
>>>>> crm_mon is broken
>>>>> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2344) for
>>>>> quite awhile.
>>>>
>>>> The fix is going to have to come from the community, I don't have any
>>>> knowledge of nagios
>>>>
>>>>> I am yet to find a good monitoring solution for pacemaker, hopefully
>>>>> somebody had more success and will share.
>>>
>>> Use SNMP. It is the standard protocol for monitoring. Add a "extend" line
>>
>> to
>>
>>> your snmpd.conf to call a script that returns the number of failcounts.
>>
>> You
>>
>>> can easily monitoring this with every NMS. For nagios use check_snmp.
>>
>> I afraid it won't be able to tell more then "stuff happened" :(
>> Would it?
>
> Yes. Like a good NMS always does. To analyse the error you still have to read
> the logs yourself.
>
What I meant was, I can't see how one "extend" line will be able to supply specifics about what exactly resource has failed.
Would you kindly share en example?
I was trying to integrate crm_mon with SNMP Trap Translator (snmptt), but haven't had luck with it either.
I posted details in another thread.
Lack of "out-of-the-box" monitoring solution for pacemaker is a major deficiency in my daily use, I am sure I am not alone.
Maybe it's out there, but Chapter 7 of "Pacemaker Explained" is yet to be written.
Thanks,
Vadym
More information about the Pacemaker
mailing list