[Pacemaker] Mail notification for fencing action

imnotpc imnotpc at rock3d.net
Wed Jun 15 18:24:17 CET 2011


On Wednesday, June 15, 2011 12:18:52 Dejan Muhamedagic wrote:
> Hi,
> 
> On Wed, Jun 15, 2011 at 06:52:21AM -0400, imnotpc wrote:
> > On Tuesday, June 14, 2011 07:17:41 Dejan Muhamedagic wrote:
> > > Hi,
> > > 
> > > On Mon, Jun 13, 2011 at 03:30:03PM -0400, imnotpc wrote:
> > > > I've created a group containing the primary RA and MailTo as the
> > > > second resource. This works as exected and sends an e-mail when the
> > > > primary resource stops or starts. I'd like to configure pacemaker to
> > > > send an e-mail any time a node goes down regardless of it having any
> > > > resource is currently running on it. I found nothing useful on
> > > > google and I've tried every configuration I can think of and I can't
> > > > figure this out. Can the MailTo resource be configured to do what I
> > > > want?
> > > 
> > > ClusterMon perhaps.
> > 
> > Thanks Dejan, I hadn't tried that. The problem with both of these is that
> > if the node running MailTo or ClusterMon goes down it can't send the
> > e-mail and the resource gets fenced defeating the whole purpose. Cloning
> > ClusterMon worked but sends a blizzard of e-mails from each node every
> > time an event occurs. I think the best answer would be for the DC to
> > handle this sort of
> 
> The DC can also fail. But I guess that you meant that a
> membership change should generate a message. I suppose that we
> should have something like that with SNMP, but don't know what's
> the current status.

What I was thinking is that the DC is never fenced and there is only one 
instance (hopefully) at a time. Conceptually these are the same requirements 
of a good notification system. I don't see how there can ever be a good 
solution that runs as a resource.
 
> 
> > thing. I'm trying to hack together a script that runs from a cron job on
> > each machine, but even if I get that to work it'll be an ugly solution
> > and I'd still get at least one e-mail from each functioning node.
> 
> Yes. So, if the node where ClusterMon runs fails, the resource
> is going to be moved elsewhere. But it doesn't know where did it
> come from and why. Tricky. Perhaps it should be modified to send
> email on every start?

I have a working script now that does what I want. It has to run on each node 
so each good node will still send an e-mail, but by changing the keywords it 
greps for, you can explicitly tell it which notifications you want mailed. 
Interestingly, if the bad node is still somewhat functional it will rat itself 
out and tell you it's out of contact. I run it every minute as a cron job so I 
also had to create a way to save the cluster state or it would send a new e-
mail every minute.

I'm pretty bad at writing scripts and it's not pretty, but you can at least 
see what I am trying to do here. I'm open to suggestions for a better way to 
do this.

Jeff


#!/bin/sh

PATH=/bin:/usr/bin:/usr/sbin
export PATH

# List of crm_mon warnings you wish to receive e-mails for.
warnlist="failed|OFFLINE|Stopped|UNCLEAN"

# Remove lock files for processes returned to normal status.
# Loop through the list of files.
for i in `ls /tmp/crm_mon/`
do
delfile=1

	# Get the current list of warnings.
	for ii in `crm_mon -1 | grep -E -w ${warnlist} | tr ' ' '_'`
	do

		# If we match there is still a problem so we leave the file.
		if [ $i == $ii ]
		then
			delfile=0
			break
		fi
	done

	# No warning found for this process so we delete the file.
	if [ $delfile == 1 ]
	then
		`rm -f /tmp/crm_mon/$i`
	fi
done

# Send an e-mail for new warnings.
# Loop through a list of the current warnings.
for iii in `crm_mon -1 | grep -E -w ${warnlist} | tr ' ' '_'`
do

	if [ -f /tmp/crm_mon/$iii ]
	then

		# We found a lock file for this warning, so no e-mail sent.
		continue
	else

		# New warning. Send an e-mail and create a lock file.
		echo "$iii" | `mailx -s "$HOSTNAME - Notice" \
			admin at your.comain`
		`touch /tmp/crm_mon/$iii`
	fi
done

exit 0




More information about the Pacemaker mailing list