[Pacemaker] ClusterMon failing: call=220, rc=1, status=complete): unknown error
Koch, Sebastian
Sebastian.Koch at netzwerk.de
Thu Jun 24 12:14:51 UTC 2010
Hi,
i got a small issue with the CLusterMon agent. The monitor actions for this agent seem to fail (if i look into syslog, you'll find it below) and i am not able to troubleshoot it. I tried to start the agent on the failed node by hand but it don't see startup / status errors. The ClusterMon seems to fail only on the passive node, therefore i thought it should be a problem caused by missing www directories or something else but i cannot see the error.
--------------------------------------------------------------------------------------------------------------
root at pilot01-node2:~/clustercompare# /usr/lib/ocf/resource.d/heartbeat/ClusterMon validate-all
Validate OK
root at pilot01-node2:~/clustercompare# /usr/lib/ocf/resource.d/heartbeat/ClusterMon stop; echo "res: $?"
res: 0
root at pilot01-node2:~/clustercompare# /usr/lib/ocf/resource.d/heartbeat/ClusterMon start; echo "res: $?"
res: 0
root at pilot01-node2:~/clustercompare# /usr/lib/ocf/resource.d/heartbeat/ClusterMon status; echo "res: $?"
usage: /usr/lib/ocf/resource.d/heartbeat/ClusterMon {start|stop|monitor|validate-all|meta-data}
Expects to have a fully populated OCF RA-compliant environment set.
res: 3
--------------------------------------------------------------------------------------------------------------
I can see that CLusterMon is started and even the html output works but there is still this error.
--------------------------------------------------------------------------------------------------------------
============
Last updated: Thu Jun 24 14:02:48 2010
Stack: openais
Current DC: pilot01-node2 - partition with quorum
Version: 1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Online: [ pilot01-node1 pilot01-node2 ]
Resource Group: grp_MySQL
res_Filesystem (ocf::heartbeat:Filesystem): Started pilot01-node2
res_ClusterIP (ocf::heartbeat:IPaddr2): Started pilot01-node2
res_MySQL (lsb:mysql): Started pilot01-node2
res_Apache (lsb:apache2): Started pilot01-node2
res_ClusterMonitor (ocf::pacemaker:ClusterMon): Started pilot01-node2
res_Nagios (lsb:nagios3): Started pilot01-node2
Master/Slave Set: ms_drbd_mysql0
Masters: [ pilot01-node2 ]
Slaves: [ pilot01-node1 ]
Clone Set: cl-pinggw
Started: [ pilot01-node1 pilot01-node2 ]
Monitor-Cluster (ocf::pacemaker:ClusterMon): Started pilot01-node2 (unmanaged) FAILED
Failed actions:
Monitor-Cluster_stop_0 (node=pilot01-node2, call=220, rc=1, status=complete): unknown error
--------------------------------------------------------------------------------------------------------------
I linked /var/www on both nodes to my cluster drbd storage.
--------------------------------------------------------------------------------------------------------------
root at pilot01-node1:/mnt/cluster/var/www# ll /var/www
lrwxrwxrwx 1 root root 20 23. Jun 17:06 /var/www -> /mnt/cluster/var/www
--------------------------------------------------------------------------------------------------------------
This is my configuration.
--------------------------------------------------------------------------------------------------------------
node pilot01-node1 \
attributes standby="off"
node pilot01-node2 \
attributes standby="off"
primitive Monitor-Cluster ocf:pacemaker:ClusterMon \
params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \
params pidfile="/var/run/rlb-cluster-monitor.pid" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="100s"
primitive drbd_pilot0 ocf:linbit:drbd \
params drbd_resource="pilot0" \
operations $id="drbd_pilot0-operations" \
op monitor interval="15s"
primitive pinggw ocf:pacemaker:pingd \
params host_list="10.1.1.162" multiplier="200" \
op monitor interval="10s"
primitive res_Apache lsb:apache2 \
operations $id="res_Apache-operations" \
op monitor interval="15s" timeout="20s" start-delay="15s"
primitive res_ClusterIP ocf:heartbeat:IPaddr2 \
params iflabel="ClusterIP" ip="10.1.1.12" nic="eth0" cidr_netmask="24" \
operations $id="res_ClusterIP_1-operations" \
op monitor start-delay="0" interval="10s"
primitive res_ClusterMonitor ocf:pacemaker:ClusterMon \
params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \
params pidfile="/var/run/rlb-cluster-monitor.pid" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="100s" \
meta target-role="Started"
primitive res_Filesystem ocf:heartbeat:Filesystem \
params fstype="xfs" directory="/mnt/cluster" device="/dev/drbd0" options="noatime,nodiratime,barrier=0"
primitive res_MySQL lsb:mysql
primitive res_Nagios lsb:nagios3 \
operations $id="res_Nagios-operations" \
op monitor interval="15s" timeout="20s" \
meta target-role="Started"
group grp_MySQL res_Filesystem res_ClusterIP res_MySQL res_Apache res_ClusterMonitor res_Nagios
ms ms_drbd_mysql0 drbd_pilot0 \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone cl-pinggw pinggw \
meta globally-unique="false"
location drbd-fence-by-handler-ms_drbd_mysql0 ms_drbd_mysql0 \
rule $id="drbd-fence-by-handler-rule-ms_drbd_mysql0" $role="Master" -inf: #uname ne pilot01-node2
location grp_MySQL-with-pinggw grp_MySQL \
rule $id="grp_MySQL-with-pinggw-rule-1" -inf: not_defined pingd or pingd lte 0
colocation col_drbd_on_mysql inf: grp_MySQL ms_drbd_mysql0:Master
order mysql_after_drbd inf: ms_drbd_mysql0:promote grp_MySQL:start
property $id="cib-bootstrap-options" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
dc-version="1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75" \
cluster-infrastructure="openais" \
last-lrm-refresh="1277380951" \
symmetric-cluster="true" \
default-action-timeout="240s"
--------------------------------------------------------------------------------------------------------------
Sebastian Koch
NETZWERK GmbH
More information about the Pacemaker
mailing list