Here it is attached.<br><br>I also see the following 2 errors in the node 2 logs which I assume mean the problem is really that node1 is not getting demoted and I&#39;m not sure why:<br><br>Error 1:<br>Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Called drbdadm -c /etc/drbd.conf primary mysqld<br>

Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Exit code 11<br>Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Command output:<br>Sep 28 19:53:20 staging2 lrmd: [1442]: info: RA output: (drbd_mysql:1:promote:stdout)<br>

Sep 28 19:53:22 staging2 lrmd: [1442]: info: RA output: (drbd_mysql:1:promote:stderr) 0: State change failed: (-1) Multiple primaries not allowed by config<br><br>Error 2:<br>Sep 28 19:53:27 staging2 kernel: d-con mysqld: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7)<br>

Sep 28 19:53:27 staging2 kernel: d-con mysqld: peer( Primary -&gt; Unknown ) conn( Connected -&gt; Disconnecting ) disk( UpToDate -&gt; Outdated ) pdsk( UpToDate -&gt; DUnknown )<br>Sep 28 19:53:27 staging2 kernel: d-con mysqld: meta connection shut down by peer.<br>

<br>Also, failover works fine if i reboot either machine.  The outdated machines comes back up as secondary.  The scenario where i get the errors above is when i pull the network cable from the primary.  Is that a stonith device that should be protecting from this scenario and potentially rebooting the primary?<br>

<br>Feels like I&#39;m getting so close to getting this working!<br><br>Thanks!<br>Charles<br><br><div class="gmail_quote">On Thu, Sep 29, 2011 at 4:15 AM, Andrew Beekhof <span dir="ltr">&lt;<a href="mailto:andrew@beekhof.net">andrew@beekhof.net</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Could you attach  /var/lib/pengine/pe-input-3802.bz2 from staging1?<br>

That would tell us why.<br>

<div><div></div><div class="h5"><br>

On Mon, Sep 26, 2011 at 10:28 PM, Charles Richard<br>

&lt;<a href="mailto:chachi.richard@gmail.com">chachi.richard@gmail.com</a>&gt; wrote:<br>

&gt; Hi,<br>

&gt;<br>

&gt; I&#39;m making some headway finally with my pacemaker install but now that<br>

&gt; crm_mon doesn&#39;t return errors any more and crm_verify is clear, I&#39;m having a<br>

&gt; problem where my master won&#39;t get promoted.  Not sure what to do with this<br>

&gt; one, any suggestions?   Here&#39;s the log snippet and config files:<br>

&gt;<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: crm_timer_popped: PEngine<br>

&gt; Recheck Timer (I_PE_CALC) just popped!<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State<br>

&gt; transition S_IDLE -&gt; S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED<br>

&gt; origin=crm_timer_popped ]<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: Progressed<br>

&gt; to state S_POLICY_ENGINE after C_TIMER_POPPED<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: All 2<br>

&gt; cluster nodes are eligible to run resources.<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_pe_invoke: Query 106:<br>

&gt; Requesting the current CIB: S_POLICY_ENGINE<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_pe_invoke_callback: Invoking<br>

&gt; the PE: query=106, ref=pe_calc-dc-1317020772-95, seq=2564, quorate=1<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_config: Startup<br>

&gt; probes: enabled<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: unpack_config: On loss of<br>

&gt; CCM Quorum: Ignore<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_config: Node scores:<br>

&gt; &#39;red&#39; = -INFINITY, &#39;yellow&#39; = 0, &#39;green&#39; = 0<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_domains: Unpacking<br>

&gt; domains<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: determine_online_status:<br>

&gt; Node <a href="http://staging1.dev.applepeak.com" target="_blank">staging1.dev.applepeak.com</a> is online<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: determine_online_status:<br>

&gt; Node <a href="http://staging2.dev.applepeak.com" target="_blank">staging2.dev.applepeak.com</a> is online<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: group_print:  Resource<br>

&gt; Group: mysql<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print:<br>

&gt; fs_mysql#011(ocf::heartbeat:Filesystem):#011Stopped<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print:<br>

&gt; ip_mysql#011(ocf::heartbeat:IPaddr2):#011Stopped<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print:<br>

&gt; mysqld#011(lsb:mysqld):#011Stopped<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: clone_print:  Master/Slave<br>

&gt; Set: ms_drbd_mysql<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: short_print:      Stopped:<br>

&gt; [ drbd_mysql:0 drbd_mysql:1 ]<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: master_color: ms_drbd_mysql:<br>

&gt; Promoted 0 instances of a possible 1 to master<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: native_merge_weights:<br>

&gt; fs_mysql: Rolling back scores from ip_mysql<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: native_merge_weights:<br>

&gt; ip_mysql: Rolling back scores from mysqld<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: master_color: ms_drbd_mysql:<br>

&gt; Promoted 0 instances of a possible 1 to master<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave resource<br>

&gt; fs_mysql#011(Stopped)<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave resource<br>

&gt; ip_mysql#011(Stopped)<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave resource<br>

&gt; mysqld#011(Stopped)<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave resource<br>

&gt; drbd_mysql:0#011(Stopped)<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave resource<br>

&gt; drbd_mysql:1#011(Stopped)<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State<br>

&gt; transition S_POLICY_ENGINE -&gt; S_TRANSITION_ENGINE [ input=I_PE_SUCCESS<br>

&gt; cause=C_IPC_MESSAGE origin=handle_response ]<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: unpack_graph: Unpacked<br>

&gt; transition 72: 0 actions in 0 synapses<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_te_invoke: Processing graph<br>

&gt; 72 (ref=pe_calc-dc-1317020772-95) derived from<br>

&gt; /var/lib/pengine/pe-input-3802.bz2<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: run_graph:<br>

&gt; ====================================================<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: notice: run_graph: Transition 72<br>

&gt; (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,<br>

&gt; Source=/var/lib/pengine/pe-input-3802.bz2): Complete<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: te_graph_trigger: Transition 72<br>

&gt; is now complete<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: notify_crmd: Transition 72<br>

&gt; status: done - &lt;null&gt;<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State<br>

&gt; transition S_TRANSITION_ENGINE -&gt; S_IDLE [ input=I_TE_SUCCESS<br>

&gt; cause=C_FSA_INTERNAL origin=notify_crmd ]<br>

&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: Starting<br>

&gt; PEngine Recheck Timer<br>

&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: process_pe_message:<br>

&gt; Transition 72: PEngine Input stored in: /var/lib/pengine/pe-input-3802.bz2<br>

&gt; Sep 26 04:15:09 staging1 cib: [1682]: info: cib_stats: Processed 1<br>

&gt; operations (0.00us average, 0% utilization) in the last 10min<br>

&gt;<br>

&gt; My drbd config file:<br>

&gt;<br>

&gt; resource mysqld {<br>

&gt;<br>

&gt; protocol C;<br>

&gt;<br>

&gt; startup { wfc-timeout 0; degr-wfc-timeout 120; }<br>

&gt;<br>

&gt; disk { on-io-error detach; }<br>

&gt;<br>

&gt;<br>

&gt; on staging1 {<br>

&gt;<br>

&gt; device /dev/drbd0;<br>

&gt;<br>

&gt; disk /dev/vg_staging1/lv_data;<br>

&gt;<br>

&gt; meta-disk internal;<br>

&gt;<br>

&gt; address <a href="http://10.10.20.1:7788" target="_blank">10.10.20.1:7788</a>;<br>

&gt;<br>

&gt; }<br>

&gt;<br>

&gt; on staging2 {<br>

&gt;<br>

&gt; device /dev/drbd0;<br>

&gt;<br>

&gt; disk /dev/vg_staging2/lv_data;<br>

&gt;<br>

&gt; meta-disk internal;<br>

&gt;<br>

&gt; address <a href="http://10.10.20.2:7788" target="_blank">10.10.20.2:7788</a>;<br>

&gt;<br>

&gt; }<br>

&gt;<br>

&gt; }<br>

&gt;<br>

&gt; corosync.conf:<br>

&gt;<br>

&gt; compatibility: whitetank<br>

&gt;<br>

&gt; aisexec {<br>

&gt;   user: root<br>

&gt;   group: root<br>

&gt; }<br>

&gt;<br>

&gt; totem {<br>

&gt;         version: 2<br>

&gt;         secauth: off<br>

&gt;         threads: 0<br>

&gt;         interface {<br>

&gt;                 ringnumber: 0<br>

&gt;                 bindnetaddr: 10.10.10.0<br>

&gt;                 mcastaddr: 226.94.1.1<br>

&gt;                 mcastport: 5405<br>

&gt;         }<br>

&gt; }<br>

&gt;<br>

&gt; logging {<br>

&gt;         fileline: off<br>

&gt;         to_stderr: no<br>

&gt;         to_logfile: no<br>

&gt;         to_syslog: yes<br>

&gt;         logfile: /var/log/cluster/corosync.log<br>

&gt;         debug: off<br>

&gt;         timestamp: on<br>

&gt;         logger_subsys {<br>

&gt;                 subsys: AMF<br>

&gt;                 debug: off<br>

&gt;         }<br>

&gt; }<br>

&gt;<br>

&gt; amf {<br>

&gt;         mode: disabled<br>

&gt; }<br>

&gt;<br>

&gt; service {<br>

&gt; #Load Pacemaker<br>

&gt; name: pacemaker<br>

&gt; ver: 0<br>

&gt; use_mgmtd: yes<br>

&gt; }<br>

&gt;<br>

&gt; And my crm config:<br>

&gt;<br>

&gt; node <a href="http://staging1.dev.applepeak.com" target="_blank">staging1.dev.applepeak.com</a><br>

&gt; node <a href="http://staging2.dev.applepeak.com" target="_blank">staging2.dev.applepeak.com</a><br>

&gt; primitive drbd_mysql ocf:linbit:drbd \<br>

&gt;         params drbd_resource=&quot;mysqld&quot; \<br>

&gt;         op monitor interval=&quot;15s&quot; \<br>

&gt;         op start interval=&quot;0&quot; timeout=&quot;240s&quot; \<br>

&gt;         op stop interval=&quot;0&quot; timeout=&quot;100s&quot;<br>

&gt; primitive fs_mysql ocf:heartbeat:Filesystem \<br>

&gt;         params device=&quot;/dev/drbd0&quot; directory=&quot;/opt/data/mysql/data/mysql&quot;<br>

&gt; fstype=&quot;ext4&quot; \<br>

&gt;         op start interval=&quot;0&quot; timeout=&quot;60s&quot; \<br>

&gt;         op stop interval=&quot;0&quot; timeout=&quot;60s&quot;<br>

&gt; primitive ip_mysql ocf:heartbeat:IPaddr2 \<br>

&gt;         params ip=&quot;10.10.10.31&quot; nic=&quot;eth0&quot;<br>

&gt; primitive mysqld lsb:mysqld<br>

&gt; group mysql fs_mysql ip_mysql mysqld<br>

&gt; ms ms_drbd_mysql drbd_mysql \<br>

&gt;         meta master-max=&quot;1&quot; master-node-max=&quot;1&quot; clone-max=&quot;2&quot;<br>

&gt; clone-node-max=&quot;1&quot; notify=&quot;true&quot;<br>

&gt; colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master<br>

&gt; order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start<br>

&gt; property $id=&quot;cib-bootstrap-options&quot; \<br>

&gt;         dc-version=&quot;1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe&quot; \<br>

&gt;         cluster-infrastructure=&quot;openais&quot; \<br>

&gt;         expected-quorum-votes=&quot;2&quot; \<br>

&gt;         stonith-enabled=&quot;false&quot; \<br>

&gt;         last-lrm-refresh=&quot;1316961847&quot; \<br>

&gt;         stop-all-resources=&quot;true&quot; \<br>

&gt;         no-quorum-policy=&quot;ignore&quot;<br>

&gt; rsc_defaults $id=&quot;rsc-options&quot; \<br>

&gt;         resource-stickiness=&quot;100&quot;<br>

&gt;<br>

&gt; Thanks,<br>

&gt; Charles<br>

&gt;<br>

</div></div>&gt; _______________________________________________<br>

&gt; Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

&gt; <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

&gt;<br>

&gt; Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

&gt; Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

&gt; Bugs:<br>

&gt; <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>

&gt;<br>

&gt;<br>

<br>

_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>

</blockquote></div><br>