[Pacemaker] ClusterMon - using 'crm resource failcount ...' in external agent

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Jul 30 08:30:35 EDT 2013


Hi,

On Tue, Jul 30, 2013 at 12:56:00PM +0200, D.Gossrau wrote:
> Hi,
> 
> I'm trying to get the failcount of a resource (crm resource
> failcount <rsc> show <node>) in a script which is used in ClusterMon
> as external agent. The configuration is as follows:
> 
> # resource Cluster Monitor
> primitive resClusterMon ocf:pacemaker:ClusterMon \
>     params user="root" \
>     extra_options="-fn --external-agent=/tmp/test.sh" \
>     pidfile="/var/run/clusterMon.pid" \
>     htmlfile="/usr/share/tomcat/webapps/andphone/ClusterStatus.html"
> 
> 
> shell script /tmp/test.sh:
> #!/bin/bash
> LOGFILE="/var/log/clusterChange.log"
> echo "${DATE} ($$): ${CRM_notify_node}, ${CRM_notify_rsc},
> ${CRM_notify_task}, ${CRM_notify_desc}, ${CRM_notify_rc}" >> ${LOGFILE}
> 
> /usr/sbin/crm resource failcount ${CRM_notify_rsc} show
> ${CRM_notify_node} >> ${LOGFILE} 2>&1
> 
> exit
> 
> 
> Following error occurs:
> 
> Traceback (most recent call last):
>   File "/usr/sbin/crm", line 33, in <module>
>     from crm import main
>   File "/usr/lib64/python2.6/site-packages/crm/main.py", line 22, in
> <module>
>     from utils import *
>   File "/usr/lib64/python2.6/site-packages/crm/utils.py", line 28,
> in <module>
>     from msg import *
>   File "/usr/lib64/python2.6/site-packages/crm/msg.py", line 158, in
> <module>
>     user_prefs = UserPrefs.getInstance()
>   File "/usr/lib64/python2.6/site-packages/crm/singletonmixin.py",
> line 202, in getInstance
>     _createSingletonInstance(cls, lstArgs, dctKwArgs)
>   File "/usr/lib64/python2.6/site-packages/crm/singletonmixin.py",
> line 134, in _createSingletonInstance
>     instance.__init__(*lstArgs, **dctKwArgs)
>   File "/usr/lib64/python2.6/site-packages/crm/userprefs.py", line
> 57, in __init__
>     self.editor = find_program("EDITOR","vim","vi","emacs","nano")
>   File "/usr/lib64/python2.6/site-packages/crm/userprefs.py", line
> 39, in find_program
>     if is_program(prog):
>   File "/usr/lib64/python2.6/site-packages/crm/userprefs.py", line
> 34, in is_program
>     return subprocess.call("which %s >/dev/null 2>&1"%prog,
> shell=True) == 0
>   File "/usr/lib64/python2.6/subprocess.py", line 480, in call
>     return p.wait(timeout=timeout)
>   File "/usr/lib64/python2.6/subprocess.py", line 1296, in wait
>     pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
>   File "/usr/lib64/python2.6/subprocess.py", line 462, in _eintr_retry_call
>     return func(*args)
> OSError: [Errno 10] No child processes
> 
> 
> If I call the script test.sh manually after setting the used
> environment variables (CRM_notify_*) the failcounts are displayed
> correctly.
> 
> Is there something wrong in my enironment ? Could somebody enlighten me ?

Never seen this before. It could be a subprocess issue having to
do with SIGCHLD handlers which got fixed end of 2010 (see
http://bugs.python.org/issue9127). Maybe "trap '' CHLD" in your
script would help (or whichever way there is to ignore SIGCHLD).

BTW, that part of crmsh has actually been modified not to use
which(1) in a subprocess, but pure python to check for the
program existence.

Thanks,

Dejan

> My environment :
> 
> Scientific Linux 6.3
> pacemaker-cluster-libs-1.1.7-6.el6.x86_64
> pacemaker-cli-1.1.7-6.el6.x86_64
> pacemaker-libs-1.1.7-6.el6.x86_64
> pacemaker-1.1.7-6.el6.x86_64
> corosynclib-1.4.1-7.el6.x86_64
> corosync-1.4.1-7.el6.x86_64
> resource-agents-3.9.2-12.el6.x86_64
> 
> Thanks for any hints !
> 
> Kind regards,
> Detlef
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list