[Pacemaker] How to send email-notification on failure of resource in cluster frame work

Wed Mar 30 11:37:44 UTC 2011

On Mar 29, 2011, at 11:34 PM, Michael Schwartzkopff wrote:

>> On Mar 29, 2011 6:12 AM, "Michael Schwartzkopff" <misch at clusterbau.com>
>> 
>> wrote:
>>>> On Tue, Mar 29, 2011 at 3:29 AM, Vadym Chepkov <vchepkov at gmail.com>
>> 
>> wrote:
>>>>> On Mar 24, 2011, at 12:46 AM, Rakesh K wrote:
>>>>>> Hi ALL
>>>>>> Is there any way to send Email notifications when a resource is
>> 
>> failure
>> 
>>>>>> in the cluster frame work.
>>>>>> 
>>>>>> while i was going through the Pacemaker-explained document provided
>> 
>> in
>> 
>>>>>> the website www.clusterlabs.org
>>>>>> 
>>>>>> There was no content in the chapter 7 --> which is sending email
>>>>>> notification events.
>>>>>> 
>>>>>> can anybody help me regarding this.
>>>>>> 
>>>>>> for know i am approaching the crm_mon --daemonize --as-html <path ot
>>>>>> fil> to maintain the status of HA in html file.
>>>>>> 
>>>>>> Is there any other approach for sending email notification.
>>>>> 
>>>>> Last time I checked, crm_mon is not well suited for this purpose.
>>>>> 
>>>>> crm_mon has the following option
>>>>> 
>>>>>      -T, --mail-to=value
>>>>> 
>>>>>             Send  Mail  alerts  to  this  user.    See   also
>>>>> 
>>>>> --mail-from, --mail-host, --mail-prefix
>>>>> 
>>>>> But you will end-up with obscene amount of e-mails, I was blocked
>>>>> from gmail when I tried to use it once :) For one resource failure
>>>>> you will get 4 e-mails: monitor,stop,start,monitor. Now imagine if
>>>>> it was a
>> 
>> most
>> 
>>>>> significant member of a group or worse, node failure...
>>>>> 
>>>>> nagios would be better suited for this purpose, but, unfortunately,
>>>>> crm_mon is broken
>>>>> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2344) for
>>>>> quite awhile.
>>>> 
>>>> The fix is going to have to come from the community, I don't have any
>>>> knowledge of nagios
>>>> 
>>>>> I am yet to find a good monitoring solution for pacemaker, hopefully
>>>>> somebody had more success and will share.
>>> 
>>> Use SNMP. It is the standard protocol for monitoring. Add a "extend" line
>> 
>> to
>> 
>>> your snmpd.conf to call a script that returns the number of failcounts.
>> 
>> You
>> 
>>> can easily monitoring this with every NMS. For nagios use check_snmp.
>> 
>> I afraid it won't be able to tell more then "stuff happened" :(
>> Would it?
> 
> Yes. Like a good NMS always does. To analyse the error you still have to read 
> the logs yourself.
> 

What I meant was, I can't see how one "extend" line will be able to supply specifics about what exactly resource has failed.
Would you kindly share en example? 

I was trying to integrate crm_mon with SNMP Trap Translator (snmptt), but haven't had luck with it either. 
I posted details in another thread.

Lack of "out-of-the-box" monitoring solution for pacemaker is a major deficiency in my daily use, I am sure I am not alone.
Maybe it's out there, but Chapter 7 of "Pacemaker Explained" is yet to be written.

Thanks,
Vadym