[Pacemaker] rsc_op: Hard error - res_Nagios_monitor_0 failed with rc=6: Preventing res_Nagios from re-starting anywhere in the cluster

Wed Jun 23 11:19:46 EDT 2010

Hi,

i got a 2 Node Cluster up and running and right know i am trying to
configure a Nagios3 Resource. Therefore i already fixed the nagios init
script as it dind't pass the LSB Compatibility Checks as described here:
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explai
ned/ap-lsb.html 

I just needed to make sure the pid file gets removed if the stop
function is called. After this small change i passed all the LSB Checks.
Below you find the error message:

root at pilot01-node2:/var/run/nagios3# crm_verify -LV

crm_verify[7094]: 2010/06/23_16:37:27 ERROR: unpack_rsc_op: Hard error -
res_Nagios_monitor_0 failed with rc=6: Preventing res_Nagios from
re-starting anywhere in the cluster

crm_verify[7094]: 2010/06/23_16:37:27 WARN: native_color: Resource
res_Nagios cannot run anywhere

Warnings found during check: config may not be valid

I tried to find out what the init scripts must provide for allowing it
to use it in pacemaker but i just found the LSB Compatib. Hints on the
pacemaker website. I think i configured the primitive wrong or maybe the
init script is still wrong? Even if i configure it with a op monitor
action it fails. And even a crm resource cleanup  res_Nagios doesn't
help me starting the resource. 

I can run Nagios manually on the active node. I linked all shared
directories to my cluster storage device like this:

root at pilot01-node2:/etc# ll /var/lib/nagios3* /etc/nagios*

lrwxrwxrwx 1 root   root    25 23. Jun 13:54 /etc/nagios3 ->
/mnt/cluster/etc/nagios3/

lrwxrwxrwx 1 root   root    29 23. Jun 14:04 /var/lib/nagios3 ->
/mnt/cluster/var/lib/nagios3/

/etc/nagios3_bak:

insgesamt 88K

drwxr-xr-x  4 root root    146 23. Jun 13:54 .

drwxr-xr-x 75 root root   4,0K 23. Jun 17:08 ..

-rw-r--r--  1 root root   1,9K 30. Jun 2009  apache2.conf

-rw-r--r--  1 root root    11K 23. Jun 13:49 cgi.cfg

-rw-r--r--  1 root root   2,4K  2. Jul 2009  commands.cfg

drwxr-xr-x  2 root root   4,0K  7. Jun 19:16 conf.d

-rw-r--r--  1 root root     20 23. Jun 13:49 htpasswd.users

-rw-r--r--  1 root root    42K  2. Jul 2009  nagios.cfg

-rw-r-----  1 root nagios 1,3K 30. Jun 2009  resource.cfg

drwxr-xr-x  2 root root   4,0K  7. Jun 19:16 stylesheets

/etc/nagios-plugins:

insgesamt 12K

drwxr-xr-x  3 root root   19  7. Jun 19:16 .

drwxr-xr-x 75 root root 4,0K 23. Jun 17:08 ..

drwxr-xr-x  2 root root 4,0K  7. Jun 19:16 config

/var/lib/nagios3_bak:

insgesamt 20K

drwxr-x---  4 nagios nagios     47 23. Jun 14:02 .

drwxr-xr-x 33 root   root     4,0K 23. Jun 14:04 ..

-rw-------  1 nagios www-data  14K 23. Jun 14:02 retention.dat

drwx------  2 nagios www-data    6  2. Jul 2009  rw

drwxr-x---  3 nagios nagios     25  7. Jun 19:16 spool

Here is my Config.

########################

### 3. Cluster State ###

########################

============

Last updated: Wed Jun 23 17:16:33 2010

Stack: openais

Current DC: pilot01-node2 - partition with quorum

Version: 1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75

2 Nodes configured, 2 expected votes

4 Resources configured.

============

Node pilot01-node1: standby

Online: [ pilot01-node2 ]

Full list of resources:

 Resource Group: grp_MySQL

     res_Filesystem     (ocf::heartbeat:Filesystem):    Started
pilot01-node2

     res_ClusterIP      (ocf::heartbeat:IPaddr2):       Started
pilot01-node2

     res_MySQL  (lsb:mysql):    Started pilot01-node2

     res_Apache (lsb:apache2):  Started pilot01-node2

     res_ClusterMonitor (ocf::pacemaker:ClusterMon):    Started
pilot01-node2

     res_Nagios (lsb:nagios3):  Stopped

 Master/Slave Set: ms_drbd_mysql0

     Masters: [ pilot01-node2 ]

     Stopped: [ drbd_pilot0:0 ]

 Clone Set: cl-pinggw

     Started: [ pilot01-node2 ]

     Stopped: [ pinggw:0 ]

Monitor-Cluster (ocf::pacemaker:ClusterMon):    Started pilot01-node1
(unmanaged) FAILED

Failed actions:

    Monitor-Cluster_stop_0 (node=pilot01-node1, call=34, rc=1,
status=complete): unknown error

    res_Nagios_monitor_0 (node=pilot01-node1, call=84, rc=6,
status=complete): not configured

#########################

### 4. Cluster Config ###

#########################

node pilot01-node1 \

        attributes standby="on"

node pilot01-node2 \

        attributes standby="off"

primitive Monitor-Cluster ocf:pacemaker:ClusterMon \

        params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \

        params pidfile="/var/run/rlb-cluster-monitor.pid" \

        op start interval="0" timeout="90s" \

        op stop interval="0" timeout="100s"

primitive drbd_pilot0 ocf:linbit:drbd \

        params drbd_resource="pilot0" \

        operations $id="drbd_pilot0-operations" \

        op monitor interval="15s"

primitive pinggw ocf:pacemaker:pingd \

        params host_list="10.1.1.162" multiplier="200" \

        op monitor interval="10s"

primitive res_Apache lsb:apache2 \

        operations $id="res_Apache-operations" \

        op monitor interval="15s" timeout="20s" start-delay="15s"

primitive res_ClusterIP ocf:heartbeat:IPaddr2 \

        params iflabel="ClusterIP" ip="10.1.1.12" nic="eth0"
cidr_netmask="24" \

        operations $id="res_ClusterIP_1-operations" \

        op monitor start-delay="0" interval="10s"

primitive res_ClusterMonitor ocf:pacemaker:ClusterMon \

        params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \

        params pidfile="/var/run/rlb-cluster-monitor.pid" \

        op start interval="0" timeout="90s" \

        op stop interval="0" timeout="100s" \

        meta target-role="Started"

primitive res_Filesystem ocf:heartbeat:Filesystem \

        params fstype="xfs" directory="/mnt/cluster" device="/dev/drbd0"
options="noatime,nodiratime,barrier=0"

primitive res_MySQL lsb:mysql

primitive res_Nagios lsb:nagios3 \

        operations $id="res_Nagios-operations" \

        op monitor interval="15s" timeout="20s" \

        meta target-role="Started"

group grp_MySQL res_Filesystem res_ClusterIP res_MySQL res_Apache
res_ClusterMonitor res_Nagios

ms ms_drbd_mysql0 drbd_pilot0 \

        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"

clone cl-pinggw pinggw \

        meta globally-unique="false"

location drbd-fence-by-handler-ms_drbd_mysql0 ms_drbd_mysql0 \

        rule $id="drbd-fence-by-handler-rule-ms_drbd_mysql0"
$role="Master" -inf: #uname ne pilot01-node2

location grp_MySQL-with-pinggw grp_MySQL \

        rule $id="grp_MySQL-with-pinggw-rule-1" -inf: not_defined pingd
or pingd lte 0

colocation col_drbd_on_mysql inf: grp_MySQL ms_drbd_mysql0:Master

order mysql_after_drbd inf: ms_drbd_mysql0:promote grp_MySQL:start

property $id="cib-bootstrap-options" \

        expected-quorum-votes="2" \

        stonith-enabled="false" \

        no-quorum-policy="ignore" \

        dc-version="1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75" \

        cluster-infrastructure="openais" \

        last-lrm-refresh="1277306106" \

        symmetric-cluster="true" \

        migration-threshold="1" \

        default-action-timeout="240s"

Thanks for your help in advance.

Sebastian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100623/dcb544ee/attachment.html>