Hi Andreas,<div><br></div><div>Yes this is only for testing. The specific test was not two VM&#39;s running on same host. We have two physical servers each running a VM &amp; the VM&#39;s run pacemaker/heartbeat. We reboot both physical servers (to simulate a power-fail) &amp; after that watch both VM&#39;s do negotiation.</div>
<div><br></div><div>--Shyam<br><br><div class="gmail_quote">On Thu, Feb 2, 2012 at 3:38 PM, Andreas Kurz <span dir="ltr">&lt;<a href="mailto:andreas@hastexo.com">andreas@hastexo.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On 02/02/2012 04:45 AM, Shyam wrote:<br>
&gt; Hi Andreas,<br>
&gt;<br>
&gt; Thanks for your reply.<br>
&gt;<br>
&gt; We are using pacemaker in VM environment &amp; was primarily checking how it<br>
&gt; behaves when two nodes hosting the clustered VM&#39;s reboot. It apparently<br>
&gt; took a very long time doing the elections.<br>
<br>
</div>Ok, but this is only for testing? For a production system the VMs<br>
running a cluster should not run on the same host as this would be a SPOF.<br>
<div class="im"><br>
&gt;<br>
&gt; I realized that we were using dc-deadtime at 5sec. After bumping this up<br>
&gt; to 10sec, this long election cycle problem disappeared.<br>
<br>
</div>... interesting<br>
<div class="im"><br>
Regards,<br>
Andreas<br>
<br>
--<br>
Need help with Pacemaker?<br>
<a href="http://www.hastexo.com/now" target="_blank">http://www.hastexo.com/now</a><br>
<br>
&gt;<br>
</div><div class="im">&gt; --Shyam<br>
&gt;<br>
&gt; On Thu, Feb 2, 2012 at 3:59 AM, Andreas Kurz &lt;<a href="mailto:andreas@hastexo.com">andreas@hastexo.com</a><br>
</div><div><div></div><div class="h5">&gt; &lt;mailto:<a href="mailto:andreas@hastexo.com">andreas@hastexo.com</a>&gt;&gt; wrote:<br>
&gt;<br>
&gt;     On 01/27/2012 12:21 PM, Shyam wrote:<br>
&gt;     &gt; Folks,<br>
&gt;     &gt;<br>
&gt;     &gt; We are constantly running into a long election cycle where in a 2-node<br>
&gt;     &gt; cluster when both of them are simultaneously rebooted, they take a<br>
&gt;     long<br>
&gt;     &gt; time running through election loop.<br>
&gt;<br>
&gt;     why do you want to reboot them simultaneously? ... stop them one after<br>
&gt;     another and this will work fine.<br>
&gt;<br>
&gt;     If you want to avoid time consuming resource movement use cluster<br>
&gt;     property stop-all-resources prior to the serialized shutdown.<br>
&gt;<br>
&gt;     Regards,<br>
&gt;     Andreas<br>
&gt;<br>
&gt;     --<br>
&gt;     Need help with Pacemaker?<br>
&gt;     <a href="http://www.hastexo.com/now" target="_blank">http://www.hastexo.com/now</a><br>
&gt;<br>
&gt;     &gt;<br>
&gt;     &gt; On one node pacemaker loops like:<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover:<br>
&gt;     &gt; Taking over DC status for this partition<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_readwrite: We are now in R/O mode<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_request: Operation complete: op cib_slave_all for section<br>
&gt;     &gt; &#39;all&#39; (origin=local/crmd/222, version=1.1.1): ok (rc=0)<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_readwrite: We are now in R/W mode<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_request: Operation complete: op cib_master for section<br>
&gt;     &#39;all&#39;<br>
&gt;     &gt; (origin=local/crmd/223, version=1.1.1): ok (rc=0)<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_request: Operation complete: op cib_modify for section cib<br>
&gt;     &gt; (origin=local/crmd/224, version=1.1.1): ok (rc=0)<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_request: Operation complete: op cib_modify for section<br>
&gt;     &gt; crm_config (origin=local/crmd/226, version=1.1.1): ok (rc=0)<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; do_dc_join_offer_all: join-25: Waiting on 2 outstanding join acks<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_request: Operation complete: op cib_modify for section<br>
&gt;     &gt; crm_config (origin=local/crmd/228, version=1.1.1): ok (rc=0)<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; config_query_callback: Checking for expired actions every 900000ms<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; do_election_count_vote: Election 50 (owner:<br>
&gt;     &gt; 00000156-0156-0000-2b91-000000000000) pass: vote from<br>
&gt;     vsa-0000009c-vc-0<br>
&gt;     &gt; (Age)<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc:<br>
&gt;     Set DC<br>
&gt;     &gt; to vsa-0000009c-vc-1 (3.0.1)<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; do_state_transition: State transition S_INTEGRATION -&gt; S_ELECTION [<br>
&gt;     &gt; input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset<br>
&gt;     &gt; DC vsa-0000009c-vc-1<br>
&gt;     &gt; Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; do_election_count_vote: Election 51 (owner:<br>
&gt;     &gt; 00000156-0156-0000-2b91-000000000000) pass: vote from<br>
&gt;     vsa-0000009c-vc-0<br>
&gt;     &gt; (Age)<br>
&gt;     &gt; Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA:<br>
&gt;     Input<br>
&gt;     &gt; I_JOIN_REQUEST from route_message() received in state S_ELECTION<br>
&gt;     &gt; Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; do_state_transition: State transition S_ELECTION -&gt; S_INTEGRATION [<br>
&gt;     &gt; input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]<br>
&gt;     &gt; Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:<br>
&gt;     &gt; Starting sub-system &quot;pengine&quot;<br>
&gt;     &gt; Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:<br>
&gt;     &gt; Client pengine already running as pid 1234<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover:<br>
&gt;     &gt; Taking over DC status for this partition<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_readwrite: We are now in R/O mode<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_request: Operation complete: op cib_slave_all for section<br>
&gt;     &gt; &#39;all&#39; (origin=local/crmd/231, version=1.1.1): ok (rc=0)<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_readwrite: We are now in R/W mode<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_request: Operation complete: op cib_master for section<br>
&gt;     &#39;all&#39;<br>
&gt;     &gt; (origin=local/crmd/232, version=1.1.1): ok (rc=0)<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_request: Operation complete: op cib_modify for section cib<br>
&gt;     &gt; (origin=local/crmd/233, version=1.1.1): ok (rc=0)<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_request: Operation complete: op cib_modify for section<br>
&gt;     &gt; crm_config (origin=local/crmd/235, version=1.1.1): ok (rc=0)<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; do_dc_join_offer_all: join-26: Waiting on 2 outstanding join acks<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info:<br>
&gt;     &gt; cib_process_request: Operation complete: op cib_modify for section<br>
&gt;     &gt; crm_config (origin=local/crmd/237, version=1.1.1): ok (rc=0)<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; config_query_callback: Checking for expired actions every 900000ms<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; do_election_count_vote: Election 52 (owner:<br>
&gt;     &gt; 00000156-0156-0000-2b91-000000000000) pass: vote from<br>
&gt;     vsa-0000009c-vc-0<br>
&gt;     &gt; (Age)<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc:<br>
&gt;     Set DC<br>
&gt;     &gt; to vsa-0000009c-vc-1 (3.0.1)<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; do_state_transition: State transition S_INTEGRATION -&gt; S_ELECTION [<br>
&gt;     &gt; input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset<br>
&gt;     &gt; DC vsa-0000009c-vc-1<br>
&gt;     &gt; Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; do_election_count_vote: Election 53 (owner:<br>
&gt;     &gt; 00000156-0156-0000-2b91-000000000000) pass: vote from<br>
&gt;     vsa-0000009c-vc-0<br>
&gt;     &gt; (Age)<br>
&gt;     &gt; Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA:<br>
&gt;     Input<br>
&gt;     &gt; I_JOIN_REQUEST from route_message() received in state S_ELECTION<br>
&gt;     &gt; Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info:<br>
&gt;     &gt; do_state_transition: State transition S_ELECTION -&gt; S_INTEGRATION [<br>
&gt;     &gt; input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]<br>
&gt;     &gt; Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem:<br>
&gt;     &gt; Starting sub-system &quot;pengine&quot;<br>
&gt;     &gt; Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem:<br>
&gt;     &gt; Client pengine already running as pid 1234<br>
&gt;     &gt;<br>
&gt;     &gt; &amp;  other node with<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
&gt;     crm_timer_popped:<br>
&gt;     &gt; Election Trigger (I_DC_TIMEOUT) just popped!<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA:<br>
&gt;     Input<br>
&gt;     &gt; I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING<br>
&gt;     &gt; Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
&gt;     &gt; do_state_transition: State transition S_PENDING -&gt; S_ELECTION [<br>
&gt;     &gt; input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]<br>
&gt;     &gt; Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA:<br>
&gt;     Input<br>
&gt;     &gt; I_JOIN_OFFER from route_message() received in state S_ELECTION<br>
&gt;     &gt; Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
&gt;     &gt; do_state_transition: State transition S_ELECTION -&gt; S_PENDING [<br>
&gt;     &gt; input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
&gt;     &gt; Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
&gt;     do_dc_release: DC<br>
&gt;     &gt; role released<br>
&gt;     &gt; Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:<br>
&gt;     &gt; Transitioner is now inactive<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
&gt;     crm_timer_popped:<br>
&gt;     &gt; Election Trigger (I_DC_TIMEOUT) just popped!<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA:<br>
&gt;     Input<br>
&gt;     &gt; I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING<br>
&gt;     &gt; Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
&gt;     &gt; do_state_transition: State transition S_PENDING -&gt; S_ELECTION [<br>
&gt;     &gt; input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]<br>
&gt;     &gt; Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA:<br>
&gt;     Input<br>
&gt;     &gt; I_JOIN_OFFER from route_message() received in state S_ELECTION<br>
&gt;     &gt; Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
&gt;     &gt; do_state_transition: State transition S_ELECTION -&gt; S_PENDING [<br>
&gt;     &gt; input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]<br>
&gt;     &gt; Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info:<br>
&gt;     do_dc_release: DC<br>
&gt;     &gt; role released<br>
&gt;     &gt; Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control:<br>
&gt;     &gt; Transitioner is now inactive<br>
&gt;     &gt;<br>
&gt;     &gt; This takes several minutes &amp; finally breaks.<br>
&gt;     &gt;<br>
&gt;     &gt; Any pointers on what can be causing this?<br>
&gt;     &gt;<br>
&gt;     &gt; Thanks.<br>
&gt;     &gt;<br>
&gt;     &gt; --Shyam<br>
&gt;     &gt;<br>
&gt;     &gt;<br>
&gt;     &gt; _______________________________________________<br>
&gt;     &gt; Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
</div></div>&gt;     &lt;mailto:<a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a>&gt;<br>
<div class="im">&gt;     &gt; <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
&gt;     &gt;<br>
&gt;     &gt; Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
&gt;     &gt; Getting started:<br>
&gt;     <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
&gt;     &gt; Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;     _______________________________________________<br>
&gt;     Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
</div>&gt;     &lt;mailto:<a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a>&gt;<br>
<div><div></div><div class="h5">&gt;     <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
&gt;<br>
&gt;     Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
&gt;     Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
&gt;     Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; _______________________________________________<br>
&gt; Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
&gt; <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
&gt;<br>
&gt; Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
&gt; Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
&gt; Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
<br>
<br>
</div></div><br>_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
<br></blockquote></div><br></div>