[Pacemaker] "rc=-2" when executing monitors
Roberto Suarez Soto
robe at allenta.com
Wed Aug 26 11:15:42 UTC 2009
Hi,
we've recently deployed a two-node cluster using pacemaker, and we're
seeing a strange thing in the logs: from time to time, the monitor operation
fails with "rc=-2". This is an example:
crmd: [5506]: info: process_lrm_event: LRM operation rinetd_monitor_60000
(call=427, rc=-2, cib-update=0, confirmed=true) Cancelled unknown exec
error
crmd: [5506]: info: process_lrm_event: LRM operation rinetd_monitor_60000
(call=464, rc=-2, cib-update=0, confirmed=true) Cancelled unknown exec error
This happens not only with LSB, but also with OCF resources; though
in both cases they are made by us, so we may be doing something wrong. In one
case, we use "ps" and "grep" to find the process we're monitoring:
ps ax | grep -v grep | egrep "ftp-proxy .* ${OCF_RESKEY_config}" >/dev/null 2>/dev/null
if [ $? -eq 0 ];then
return $OCF_SUCCESS
else
return $OCF_NOT_RUNNING
fi
This is, as you may guess, an OCF for ftp-proxy. I know that we
should be using a different error instead of just returning OCF_NOT_RUNNING
for everything not 0, and we've fixed it already. But I wanted to show you
the OCF exactly like it was when the failures ocurred, just in case it has
something to do. In fact, returning OCF_NOT_RUNNING (rc=7, IIRC) for
every return code not 0 makes it more strange that there is a rc=-2 in the
logs.
And this is the monitor snippet for an LSB script for rinetd, the one
whose errors I pasted above:
pidof rinetd > /dev/null
R=$?
case "$R" in
0)
echo "rinetd is running (PID `pidof 'rinetd'`)"
exit 0
;;
*)
echo "rinetd is NOT running (rc=$R)"
exit 3
;;
esac
As you can see, here there is also no way of returning something
different from 0 or 3.
What may be the cause of these strange failures that we're seeing? If
more info or testing is needed, please ask.
Thanks in advance.
--
Roberto Suarez Soto Allenta Consulting
robe at allenta.com www.allenta.com
+34 881 922 600
More information about the Pacemaker
mailing list