[Pacemaker] rsc_op: Hard error - res_Nagios_monitor_0 failed with rc=6: Preventing res_Nagios from re-starting anywhere in the cluster

Thu Jun 24 07:19:11 UTC 2010

On Wed, Jun 23, 2010 at 5:19 PM, Koch, Sebastian
<Sebastian.Koch at netzwerk.de> wrote:
> Hi,
>
>
>
> i got a 2 Node Cluster up and running and right know i am trying to
> configure a Nagios3 Resource. Therefore i already fixed the nagios init
> script as it dind’t pass the LSB Compatibility Checks as described here:
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-lsb.html
>
>
>
> I just needed to make sure the pid file gets removed if the stop function is
> called. After this small change i passed all the LSB Checks. Below you find
> the error message:
>
>
>
> root at pilot01-node2:/var/run/nagios3# crm_verify -LV
>
> crm_verify[7094]: 2010/06/23_16:37:27 ERROR: unpack_rsc_op: Hard error -
> res_Nagios_monitor_0 failed with rc=6: Preventing res_Nagios from
> re-starting anywhere in the cluster

Looks like its still failing the fifth LSB check from the above url.
"Did the command print result: 3"

>
> crm_verify[7094]: 2010/06/23_16:37:27 WARN: native_color: Resource
> res_Nagios cannot run anywhere
>
> Warnings found during check: config may not be valid
>
>
>
> I tried to find out what the init scripts must provide for allowing it to
> use it in pacemaker but i just found the LSB Compatib. Hints on the
> pacemaker website. I think i configured the primitive wrong or maybe the
> init script is still wrong? Even if i configure it with a op monitor action
> it fails. And even a crm resource cleanup  res_Nagios doesn’t help me
> starting the resource.
>
>
>
> I can run Nagios manually on the active node. I linked all shared
> directories to my cluster storage device like this:
>
>
>
> root at pilot01-node2:/etc# ll /var/lib/nagios3* /etc/nagios*
>
> lrwxrwxrwx 1 root   root    25 23. Jun 13:54 /etc/nagios3 ->
> /mnt/cluster/etc/nagios3/
>
> lrwxrwxrwx 1 root   root    29 23. Jun 14:04 /var/lib/nagios3 ->
> /mnt/cluster/var/lib/nagios3/
>
>
>
> /etc/nagios3_bak:
>
> insgesamt 88K
>
> drwxr-xr-x  4 root root    146 23. Jun 13:54 .
>
> drwxr-xr-x 75 root root   4,0K 23. Jun 17:08 ..
>
> -rw-r--r--  1 root root   1,9K 30. Jun 2009  apache2.conf
>
> -rw-r--r--  1 root root    11K 23. Jun 13:49 cgi.cfg
>
> -rw-r--r--  1 root root   2,4K  2. Jul 2009  commands.cfg
>
> drwxr-xr-x  2 root root   4,0K  7. Jun 19:16 conf.d
>
> -rw-r--r--  1 root root     20 23. Jun 13:49 htpasswd.users
>
> -rw-r--r--  1 root root    42K  2. Jul 2009  nagios.cfg
>
> -rw-r-----  1 root nagios 1,3K 30. Jun 2009  resource.cfg
>
> drwxr-xr-x  2 root root   4,0K  7. Jun 19:16 stylesheets
>
>
>
> /etc/nagios-plugins:
>
> insgesamt 12K
>
> drwxr-xr-x  3 root root   19  7. Jun 19:16 .
>
> drwxr-xr-x 75 root root 4,0K 23. Jun 17:08 ..
>
> drwxr-xr-x  2 root root 4,0K  7. Jun 19:16 config
>
>
>
> /var/lib/nagios3_bak:
>
> insgesamt 20K
>
> drwxr-x---  4 nagios nagios     47 23. Jun 14:02 .
>
> drwxr-xr-x 33 root   root     4,0K 23. Jun 14:04 ..
>
> -rw-------  1 nagios www-data  14K 23. Jun 14:02 retention.dat
>
> drwx------  2 nagios www-data    6  2. Jul 2009  rw
>
> drwxr-x---  3 nagios nagios     25  7. Jun 19:16 spool
>
>
>
> Here is my Config.
>
>
>
> ########################
>
> ### 3. Cluster State ###
>
> ########################
>
>
>
> ============
>
> Last updated: Wed Jun 23 17:16:33 2010
>
> Stack: openais
>
> Current DC: pilot01-node2 - partition with quorum
>
> Version: 1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75
>
> 2 Nodes configured, 2 expected votes
>
> 4 Resources configured.
>
> ============
>
>
>
> Node pilot01-node1: standby
>
> Online: [ pilot01-node2 ]
>
>
>
> Full list of resources:
>
>
>
>  Resource Group: grp_MySQL
>
>      res_Filesystem     (ocf::heartbeat:Filesystem):    Started
> pilot01-node2
>
>      res_ClusterIP      (ocf::heartbeat:IPaddr2):       Started
> pilot01-node2
>
>      res_MySQL  (lsb:mysql):    Started pilot01-node2
>
>      res_Apache (lsb:apache2):  Started pilot01-node2
>
>      res_ClusterMonitor (ocf::pacemaker:ClusterMon):    Started
> pilot01-node2
>
>      res_Nagios (lsb:nagios3):  Stopped
>
>  Master/Slave Set: ms_drbd_mysql0
>
>      Masters: [ pilot01-node2 ]
>
>      Stopped: [ drbd_pilot0:0 ]
>
>  Clone Set: cl-pinggw
>
>      Started: [ pilot01-node2 ]
>
>      Stopped: [ pinggw:0 ]
>
> Monitor-Cluster (ocf::pacemaker:ClusterMon):    Started pilot01-node1
> (unmanaged) FAILED
>
>
>
> Failed actions:
>
>     Monitor-Cluster_stop_0 (node=pilot01-node1, call=34, rc=1,
> status=complete): unknown error
>
>     res_Nagios_monitor_0 (node=pilot01-node1, call=84, rc=6,
> status=complete): not configured
>
> #########################
>
> ### 4. Cluster Config ###
>
> #########################
>
>
>
> node pilot01-node1 \
>
>         attributes standby="on"
>
> node pilot01-node2 \
>
>         attributes standby="off"
>
> primitive Monitor-Cluster ocf:pacemaker:ClusterMon \
>
>         params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \
>
>         params pidfile="/var/run/rlb-cluster-monitor.pid" \
>
>         op start interval="0" timeout="90s" \
>
>         op stop interval="0" timeout="100s"
>
> primitive drbd_pilot0 ocf:linbit:drbd \
>
>         params drbd_resource="pilot0" \
>
>         operations $id="drbd_pilot0-operations" \
>
>         op monitor interval="15s"
>
> primitive pinggw ocf:pacemaker:pingd \
>
>         params host_list="10.1.1.162" multiplier="200" \
>
>         op monitor interval="10s"
>
> primitive res_Apache lsb:apache2 \
>
>         operations $id="res_Apache-operations" \
>
>         op monitor interval="15s" timeout="20s" start-delay="15s"
>
> primitive res_ClusterIP ocf:heartbeat:IPaddr2 \
>
>         params iflabel="ClusterIP" ip="10.1.1.12" nic="eth0"
> cidr_netmask="24" \
>
>         operations $id="res_ClusterIP_1-operations" \
>
>         op monitor start-delay="0" interval="10s"
>
> primitive res_ClusterMonitor ocf:pacemaker:ClusterMon \
>
>         params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \
>
>         params pidfile="/var/run/rlb-cluster-monitor.pid" \
>
>         op start interval="0" timeout="90s" \
>
>         op stop interval="0" timeout="100s" \
>
>         meta target-role="Started"
>
> primitive res_Filesystem ocf:heartbeat:Filesystem \
>
>         params fstype="xfs" directory="/mnt/cluster" device="/dev/drbd0"
> options="noatime,nodiratime,barrier=0"
>
> primitive res_MySQL lsb:mysql
>
> primitive res_Nagios lsb:nagios3 \
>
>         operations $id="res_Nagios-operations" \
>
>         op monitor interval="15s" timeout="20s" \
>
>         meta target-role="Started"
>
> group grp_MySQL res_Filesystem res_ClusterIP res_MySQL res_Apache
> res_ClusterMonitor res_Nagios
>
> ms ms_drbd_mysql0 drbd_pilot0 \
>
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
>
> clone cl-pinggw pinggw \
>
>         meta globally-unique="false"
>
> location drbd-fence-by-handler-ms_drbd_mysql0 ms_drbd_mysql0 \
>
>         rule $id="drbd-fence-by-handler-rule-ms_drbd_mysql0" $role="Master"
> -inf: #uname ne pilot01-node2
>
> location grp_MySQL-with-pinggw grp_MySQL \
>
>         rule $id="grp_MySQL-with-pinggw-rule-1" -inf: not_defined pingd or
> pingd lte 0
>
> colocation col_drbd_on_mysql inf: grp_MySQL ms_drbd_mysql0:Master
>
> order mysql_after_drbd inf: ms_drbd_mysql0:promote grp_MySQL:start
>
> property $id="cib-bootstrap-options" \
>
>         expected-quorum-votes="2" \
>
>         stonith-enabled="false" \
>
>         no-quorum-policy="ignore" \
>
>         dc-version="1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75" \
>
>         cluster-infrastructure="openais" \
>
>         last-lrm-refresh="1277306106" \
>
>         symmetric-cluster="true" \
>
>         migration-threshold="1" \
>
>         default-action-timeout="240s"
>
>
>
> Thanks for your help in advance.
>
> Sebastian
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>