[Pacemaker] ClusterMon failing: call=220, rc=1, status=complete): unknown error

Koch, Sebastian Sebastian.Koch at netzwerk.de
Fri Jun 25 08:38:20 EDT 2010


Hi,

i found the error. It was failing on just one note and it was always the passive node. I had a broken symlink from /var/www to my drbd device. After fixing it the ClusterMonitor runs just fine.

Best Regards,
Sebastian Koch
                                                         

-----Ursprüngliche Nachricht-----
Von: Dejan Muhamedagic [mailto:dejanmm at fastmail.fm] 
Gesendet: Donnerstag, 24. Juni 2010 15:33
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] ClusterMon failing: call=220, rc=1, status=complete): unknown error

On Thu, Jun 24, 2010 at 02:14:51PM +0200, Koch, Sebastian wrote:
> Hi,
> 
> i got a small issue with the CLusterMon agent. The monitor
> actions for this agent seem to fail (if i look into syslog,
> you'll find it below) and i am not able to troubleshoot it. I
> tried to start the agent on the failed node by hand but it
> don't see startup / status errors. The ClusterMon seems to fail
> only on the passive node, therefore i thought it should be a
> problem caused by missing www directories or something else but
> i cannot see the error.
> 
> --------------------------------------------------------------------------------------------------------------
> root at pilot01-node2:~/clustercompare# /usr/lib/ocf/resource.d/heartbeat/ClusterMon validate-all
> Validate OK
> root at pilot01-node2:~/clustercompare# /usr/lib/ocf/resource.d/heartbeat/ClusterMon stop; echo "res: $?"
> res: 0
> root at pilot01-node2:~/clustercompare# /usr/lib/ocf/resource.d/heartbeat/ClusterMon start; echo "res: $?"
> res: 0
> root at pilot01-node2:~/clustercompare# /usr/lib/ocf/resource.d/heartbeat/ClusterMon status; echo "res: $?"
> usage: /usr/lib/ocf/resource.d/heartbeat/ClusterMon {start|stop|monitor|validate-all|meta-data}
> 
> Expects to have a fully populated OCF RA-compliant environment set.
> res: 3

If you want to run it by hand you need to set the parameters
(OCF_RESKEY_*) and export OCF_ROOT=/usr/lib/ocf.

> --------------------------------------------------------------------------------------------------------------
> 
> I can see that CLusterMon is started and even the html output
> works but there is still this error.

Take a look at the logs. In particular for output from ClusterMon
and lrmd.

Thanks,

Dejan

> --------------------------------------------------------------------------------------------------------------
> ============
> Last updated: Thu Jun 24 14:02:48 2010
> Stack: openais
> Current DC: pilot01-node2 - partition with quorum
> Version: 1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> ============
> 
> Online: [ pilot01-node1 pilot01-node2 ]
> 
>  Resource Group: grp_MySQL
>      res_Filesystem     (ocf::heartbeat:Filesystem):    Started pilot01-node2
>      res_ClusterIP      (ocf::heartbeat:IPaddr2):       Started pilot01-node2
>      res_MySQL  (lsb:mysql):    Started pilot01-node2
>      res_Apache (lsb:apache2):  Started pilot01-node2
>      res_ClusterMonitor (ocf::pacemaker:ClusterMon):    Started pilot01-node2
>      res_Nagios (lsb:nagios3):  Started pilot01-node2
>  Master/Slave Set: ms_drbd_mysql0
>      Masters: [ pilot01-node2 ]
>      Slaves: [ pilot01-node1 ]
>  Clone Set: cl-pinggw
>      Started: [ pilot01-node1 pilot01-node2 ]
> Monitor-Cluster (ocf::pacemaker:ClusterMon):    Started pilot01-node2 (unmanaged) FAILED
> 
> Failed actions:
>     Monitor-Cluster_stop_0 (node=pilot01-node2, call=220, rc=1, status=complete): unknown error
> --------------------------------------------------------------------------------------------------------------
> 
> I linked /var/www on both nodes to my cluster drbd storage.
> 
> --------------------------------------------------------------------------------------------------------------
> root at pilot01-node1:/mnt/cluster/var/www# ll /var/www
> lrwxrwxrwx 1 root root 20 23. Jun 17:06 /var/www -> /mnt/cluster/var/www
> --------------------------------------------------------------------------------------------------------------
> 
> This is my configuration.
> 
> --------------------------------------------------------------------------------------------------------------
> node pilot01-node1 \
>         attributes standby="off"
> node pilot01-node2 \
>         attributes standby="off"
> primitive Monitor-Cluster ocf:pacemaker:ClusterMon \
>         params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \
>         params pidfile="/var/run/rlb-cluster-monitor.pid" \
>         op start interval="0" timeout="90s" \
>         op stop interval="0" timeout="100s"
> primitive drbd_pilot0 ocf:linbit:drbd \
>         params drbd_resource="pilot0" \
>         operations $id="drbd_pilot0-operations" \
>         op monitor interval="15s"
> primitive pinggw ocf:pacemaker:pingd \
>         params host_list="10.1.1.162" multiplier="200" \
>         op monitor interval="10s"
> primitive res_Apache lsb:apache2 \
>         operations $id="res_Apache-operations" \
>         op monitor interval="15s" timeout="20s" start-delay="15s"
> primitive res_ClusterIP ocf:heartbeat:IPaddr2 \
>         params iflabel="ClusterIP" ip="10.1.1.12" nic="eth0" cidr_netmask="24" \
>         operations $id="res_ClusterIP_1-operations" \
>         op monitor start-delay="0" interval="10s"
> primitive res_ClusterMonitor ocf:pacemaker:ClusterMon \
>         params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \
>         params pidfile="/var/run/rlb-cluster-monitor.pid" \
>         op start interval="0" timeout="90s" \
>         op stop interval="0" timeout="100s" \
>         meta target-role="Started"
> primitive res_Filesystem ocf:heartbeat:Filesystem \
>         params fstype="xfs" directory="/mnt/cluster" device="/dev/drbd0" options="noatime,nodiratime,barrier=0"
> primitive res_MySQL lsb:mysql
> primitive res_Nagios lsb:nagios3 \
>         operations $id="res_Nagios-operations" \
>         op monitor interval="15s" timeout="20s" \
>         meta target-role="Started"
> group grp_MySQL res_Filesystem res_ClusterIP res_MySQL res_Apache res_ClusterMonitor res_Nagios
> ms ms_drbd_mysql0 drbd_pilot0 \
>         meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
> clone cl-pinggw pinggw \
>         meta globally-unique="false"
> location drbd-fence-by-handler-ms_drbd_mysql0 ms_drbd_mysql0 \
>         rule $id="drbd-fence-by-handler-rule-ms_drbd_mysql0" $role="Master" -inf: #uname ne pilot01-node2
> location grp_MySQL-with-pinggw grp_MySQL \
>         rule $id="grp_MySQL-with-pinggw-rule-1" -inf: not_defined pingd or pingd lte 0
> colocation col_drbd_on_mysql inf: grp_MySQL ms_drbd_mysql0:Master
> order mysql_after_drbd inf: ms_drbd_mysql0:promote grp_MySQL:start
> property $id="cib-bootstrap-options" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore" \
>         dc-version="1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75" \
>         cluster-infrastructure="openais" \
>         last-lrm-refresh="1277380951" \
>         symmetric-cluster="true" \
>         default-action-timeout="240s"
> --------------------------------------------------------------------------------------------------------------
> 
> Sebastian Koch
>                                                          
> 
> NETZWERK GmbH
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




More information about the Pacemaker mailing list