<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">I'm having some issues with getting some cluster monitoring setup and configured on a 3 node multi-state cluster. I'm using Florian's blog as an example <a href="http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/">http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/</a>.<div><br></div><div>When I create the primitive resource it starts on one of my nodes but spawns multiple instances of crm_mon. I don't see any reason that would cause it to spawn multiple instances, its very odd behavior.</div><div><br></div><div>I was also looking for some clarification on what this resource provides….it looks to me that it kicks off a crm_mon in daemon mode that will update a .html file and with -E it will run an external script. But the resource itself doesn't trigger anything if another resource changes state only if the crm_mon process ( monitored with PID ) fails and it has to restart. If this is correct what is the best practice for monitoring additional resource states?</div><div><br></div><div>v/r</div><div><br></div><div>STEVE</div><div><br></div><div><br></div><div>Below are some additional data points. </div><div><br></div><div><br></div><div><u><b>Creating the Resource</b></u></div><div><u><br></u></div><div><div>[root@pgdb2 tmp]# crm configure primitive SNMPMon ocf:pacemaker:ClusterMon \</div><div>> params user="root" update="30" extra_options="-E /usr/local/bin/pcmk_snmp_helper.sh -e <a href="http://zen.arin.net">zen.arin.net</a>" \</div><div>> op monitor on-fail="restart" interval="60"</div></div><div><br></div><div><br></div><div><u><b>Manual crm_mon output</b></u></div><div><br></div><div><div>Last updated: Thu May 9 10:24:30 2013</div><div>Last change: Thu May 9 10:20:49 2013 via cibadmin on <a href="http://pgdb2.example.com">pgdb2.example.com</a></div><div>Stack: cman</div><div>Current DC: <a href="http://pgdb1.example.com">pgdb1.example.com</a> - partition with quorum</div><div>Version: 1.1.8-7.el6-394e906</div><div>3 Nodes configured, unknown expected votes</div><div>6 Resources configured.</div><div><br></div><div><br></div><div>Node <a href="http://pgdb1.example.com">pgdb1.example.com</a>: standby</div><div>Online: [ <a href="http://pgdb2.example.com">pgdb2.example.com</a> <a href="http://pgdb3.example.com">pgdb3.example.com</a> ]</div><div><br></div><div> PG_REP_VIP<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space:pre">        </span>Started <a href="http://pgdb2.example.com">pgdb2.example.com</a></div><div> PG_CLI_VIP<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::heartbeat:IPaddr2):<span class="Apple-tab-span" style="white-space:pre">        </span>Started <a href="http://pgdb2.example.com">pgdb2.example.com</a></div><div> Master/Slave Set: msPGSQL [PGSQL]</div><div> Masters: [ <a href="http://pgdb2.example.com">pgdb2.example.com</a> ]</div><div> Slaves: [ <a href="http://pgdb3.example.com">pgdb3.example.com</a> ]</div><div> Stopped: [ PGSQL:2 ]</div><div> SNMPMon<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::pacemaker:ClusterMon):<span class="Apple-tab-span" style="white-space:pre">        </span>Started <a href="http://pgdb3.example.com">pgdb3.example.com</a></div></div><div><br></div><div><u><b>PS to check for process on pgdb3</b></u></div><div><br></div><div><div>[root@pgdb3 tmp]# ps aux | grep crm_mon</div><div>root 16097 0.0 0.0 82624 2784 ? S 10:20 0:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E /usr/local/bin/pcmk_snmp_helper.sh -e <a href="http://zen.arin.net">zen.arin.net</a> -h /tmp/ClusterMon_SNMPMon.html</div><div>root 16099 0.0 0.0 82624 2660 ? S 10:20 0:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E /usr/local/bin/pcmk_snmp_helper.sh -e <a href="http://zen.arin.net">zen.arin.net</a> -h /tmp/ClusterMon_SNMPMon.html</div><div>root 16104 0.0 0.0 82624 2448 ? S 10:20 0:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_SNMPMon.pid -d -i 0 -E /usr/local/bin/pcmk_snmp_helper.sh -e <a href="http://zen.arin.net">zen.arin.net</a> -h /tmp/ClusterMon_SNMPMon.html</div><div>root 16515 0.0 0.0 103244 852 pts/0 S+ 10:21 0:00 grep crm_mon</div></div><div><br></div><div><b><u>Output from corosync.log</u></b></div><div><br></div><div><div>May 09 10:20:51 [3100] <a href="http://pgdb3.cha.arin.net">pgdb3.cha.arin.net</a> lrmd: info: process_lrmd_get_rsc_info: Resource 'SNMPMon' not found (3 active resources)</div><div>May 09 10:20:51 [3100] <a href="http://pgdb3.cha.arin.net">pgdb3.cha.arin.net</a> lrmd: info: process_lrmd_rsc_register: Added 'SNMPMon' to the rsc list (4 active resources)</div><div>May 09 10:20:52 [3103] <a href="http://pgdb3.cha.arin.net">pgdb3.cha.arin.net</a> crmd: info: services_os_action_execute: Managed ClusterMon_meta-data_0 process 16010 exited with rc=0</div><div>May 09 10:20:52 [3103] <a href="http://pgdb3.cha.arin.net">pgdb3.cha.arin.net</a> crmd: notice: process_lrm_event: LRM operation SNMPMon_monitor_0 (call=61, rc=7, cib-update=28, confirmed=true) not running</div><div>May 09 10:20:52 [3103] <a href="http://pgdb3.cha.arin.net">pgdb3.cha.arin.net</a> crmd: notice: process_lrm_event: LRM operation SNMPMon_start_0 (call=64, rc=0, cib-update=29, confirmed=true) ok</div><div>May 09 10:20:52 [3103] <a href="http://pgdb3.cha.arin.net">pgdb3.cha.arin.net</a> crmd: notice: process_lrm_event: LRM operation SNMPMon_monitor_60000 (call=67, rc=0, cib-update=30, confirmed=false) ok</div></div></body></html>