Here it is attached.<br><br>I also see the following 2 errors in the node 2 logs which I assume mean the problem is really that node1 is not getting demoted and I&#39;m not sure why:<br><br>Error 1:<br>Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Called drbdadm -c /etc/drbd.conf primary mysqld<br>
Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Exit code 11<br>Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Command output:<br>Sep 28 19:53:20 staging2 lrmd: [1442]: info: RA output: (drbd_mysql:1:promote:stdout)<br>
Sep 28 19:53:22 staging2 lrmd: [1442]: info: RA output: (drbd_mysql:1:promote:stderr) 0: State change failed: (-1) Multiple primaries not allowed by config<br><br>Error 2:<br>Sep 28 19:53:27 staging2 kernel: d-con mysqld: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7)<br>
Sep 28 19:53:27 staging2 kernel: d-con mysqld: peer( Primary -&gt; Unknown ) conn( Connected -&gt; Disconnecting ) disk( UpToDate -&gt; Outdated ) pdsk( UpToDate -&gt; DUnknown )<br>Sep 28 19:53:27 staging2 kernel: d-con mysqld: meta connection shut down by peer.<br>
<br>Also, failover works fine if i reboot either machine.  The outdated machines comes back up as secondary.  The scenario where i get the errors above is when i pull the network cable from the primary.  Is that a stonith device that should be protecting from this scenario and potentially rebooting the primary?<br>
<br>Feels like I&#39;m getting so close to getting this working!<br><br>Thanks!<br>Charles<br><br><div class="gmail_quote">On Thu, Sep 29, 2011 at 4:15 AM, Andrew Beekhof <span dir="ltr">&lt;<a href="mailto:andrew@beekhof.net">andrew@beekhof.net</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Could you attach  /var/lib/pengine/pe-input-3802.bz2 from staging1?<br>
That would tell us why.<br>
<div><div></div><div class="h5"><br>
On Mon, Sep 26, 2011 at 10:28 PM, Charles Richard<br>
&lt;<a href="mailto:chachi.richard@gmail.com">chachi.richard@gmail.com</a>&gt; wrote:<br>
&gt; Hi,<br>
&gt;<br>
&gt; I&#39;m making some headway finally with my pacemaker install but now that<br>
&gt; crm_mon doesn&#39;t return errors any more and crm_verify is clear, I&#39;m having a<br>
&gt; problem where my master won&#39;t get promoted.  Not sure what to do with this<br>
&gt; one, any suggestions?   Here&#39;s the log snippet and config files:<br>
&gt;<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: crm_timer_popped: PEngine<br>
&gt; Recheck Timer (I_PE_CALC) just popped!<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State<br>
&gt; transition S_IDLE -&gt; S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED<br>
&gt; origin=crm_timer_popped ]<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: Progressed<br>
&gt; to state S_POLICY_ENGINE after C_TIMER_POPPED<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: All 2<br>
&gt; cluster nodes are eligible to run resources.<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_pe_invoke: Query 106:<br>
&gt; Requesting the current CIB: S_POLICY_ENGINE<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_pe_invoke_callback: Invoking<br>
&gt; the PE: query=106, ref=pe_calc-dc-1317020772-95, seq=2564, quorate=1<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_config: Startup<br>
&gt; probes: enabled<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: unpack_config: On loss of<br>
&gt; CCM Quorum: Ignore<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_config: Node scores:<br>
&gt; &#39;red&#39; = -INFINITY, &#39;yellow&#39; = 0, &#39;green&#39; = 0<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_domains: Unpacking<br>
&gt; domains<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: determine_online_status:<br>
&gt; Node <a href="http://staging1.dev.applepeak.com" target="_blank">staging1.dev.applepeak.com</a> is online<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: determine_online_status:<br>
&gt; Node <a href="http://staging2.dev.applepeak.com" target="_blank">staging2.dev.applepeak.com</a> is online<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: group_print:  Resource<br>
&gt; Group: mysql<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print:<br>
&gt; fs_mysql#011(ocf::heartbeat:Filesystem):#011Stopped<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print:<br>
&gt; ip_mysql#011(ocf::heartbeat:IPaddr2):#011Stopped<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print:<br>
&gt; mysqld#011(lsb:mysqld):#011Stopped<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: clone_print:  Master/Slave<br>
&gt; Set: ms_drbd_mysql<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: short_print:      Stopped:<br>
&gt; [ drbd_mysql:0 drbd_mysql:1 ]<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: master_color: ms_drbd_mysql:<br>
&gt; Promoted 0 instances of a possible 1 to master<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: native_merge_weights:<br>
&gt; fs_mysql: Rolling back scores from ip_mysql<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: native_merge_weights:<br>
&gt; ip_mysql: Rolling back scores from mysqld<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: master_color: ms_drbd_mysql:<br>
&gt; Promoted 0 instances of a possible 1 to master<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave resource<br>
&gt; fs_mysql#011(Stopped)<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave resource<br>
&gt; ip_mysql#011(Stopped)<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave resource<br>
&gt; mysqld#011(Stopped)<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave resource<br>
&gt; drbd_mysql:0#011(Stopped)<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave resource<br>
&gt; drbd_mysql:1#011(Stopped)<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State<br>
&gt; transition S_POLICY_ENGINE -&gt; S_TRANSITION_ENGINE [ input=I_PE_SUCCESS<br>
&gt; cause=C_IPC_MESSAGE origin=handle_response ]<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: unpack_graph: Unpacked<br>
&gt; transition 72: 0 actions in 0 synapses<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_te_invoke: Processing graph<br>
&gt; 72 (ref=pe_calc-dc-1317020772-95) derived from<br>
&gt; /var/lib/pengine/pe-input-3802.bz2<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: run_graph:<br>
&gt; ====================================================<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: notice: run_graph: Transition 72<br>
&gt; (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,<br>
&gt; Source=/var/lib/pengine/pe-input-3802.bz2): Complete<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: te_graph_trigger: Transition 72<br>
&gt; is now complete<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: notify_crmd: Transition 72<br>
&gt; status: done - &lt;null&gt;<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State<br>
&gt; transition S_TRANSITION_ENGINE -&gt; S_IDLE [ input=I_TE_SUCCESS<br>
&gt; cause=C_FSA_INTERNAL origin=notify_crmd ]<br>
&gt; Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: Starting<br>
&gt; PEngine Recheck Timer<br>
&gt; Sep 26 04:06:12 staging1 pengine: [1685]: info: process_pe_message:<br>
&gt; Transition 72: PEngine Input stored in: /var/lib/pengine/pe-input-3802.bz2<br>
&gt; Sep 26 04:15:09 staging1 cib: [1682]: info: cib_stats: Processed 1<br>
&gt; operations (0.00us average, 0% utilization) in the last 10min<br>
&gt;<br>
&gt; My drbd config file:<br>
&gt;<br>
&gt; resource mysqld {<br>
&gt;<br>
&gt; protocol C;<br>
&gt;<br>
&gt; startup { wfc-timeout 0; degr-wfc-timeout 120; }<br>
&gt;<br>
&gt; disk { on-io-error detach; }<br>
&gt;<br>
&gt;<br>
&gt; on staging1 {<br>
&gt;<br>
&gt; device /dev/drbd0;<br>
&gt;<br>
&gt; disk /dev/vg_staging1/lv_data;<br>
&gt;<br>
&gt; meta-disk internal;<br>
&gt;<br>
&gt; address <a href="http://10.10.20.1:7788" target="_blank">10.10.20.1:7788</a>;<br>
&gt;<br>
&gt; }<br>
&gt;<br>
&gt; on staging2 {<br>
&gt;<br>
&gt; device /dev/drbd0;<br>
&gt;<br>
&gt; disk /dev/vg_staging2/lv_data;<br>
&gt;<br>
&gt; meta-disk internal;<br>
&gt;<br>
&gt; address <a href="http://10.10.20.2:7788" target="_blank">10.10.20.2:7788</a>;<br>
&gt;<br>
&gt; }<br>
&gt;<br>
&gt; }<br>
&gt;<br>
&gt; corosync.conf:<br>
&gt;<br>
&gt; compatibility: whitetank<br>
&gt;<br>
&gt; aisexec {<br>
&gt;   user: root<br>
&gt;   group: root<br>
&gt; }<br>
&gt;<br>
&gt; totem {<br>
&gt;         version: 2<br>
&gt;         secauth: off<br>
&gt;         threads: 0<br>
&gt;         interface {<br>
&gt;                 ringnumber: 0<br>
&gt;                 bindnetaddr: 10.10.10.0<br>
&gt;                 mcastaddr: 226.94.1.1<br>
&gt;                 mcastport: 5405<br>
&gt;         }<br>
&gt; }<br>
&gt;<br>
&gt; logging {<br>
&gt;         fileline: off<br>
&gt;         to_stderr: no<br>
&gt;         to_logfile: no<br>
&gt;         to_syslog: yes<br>
&gt;         logfile: /var/log/cluster/corosync.log<br>
&gt;         debug: off<br>
&gt;         timestamp: on<br>
&gt;         logger_subsys {<br>
&gt;                 subsys: AMF<br>
&gt;                 debug: off<br>
&gt;         }<br>
&gt; }<br>
&gt;<br>
&gt; amf {<br>
&gt;         mode: disabled<br>
&gt; }<br>
&gt;<br>
&gt; service {<br>
&gt; #Load Pacemaker<br>
&gt; name: pacemaker<br>
&gt; ver: 0<br>
&gt; use_mgmtd: yes<br>
&gt; }<br>
&gt;<br>
&gt; And my crm config:<br>
&gt;<br>
&gt; node <a href="http://staging1.dev.applepeak.com" target="_blank">staging1.dev.applepeak.com</a><br>
&gt; node <a href="http://staging2.dev.applepeak.com" target="_blank">staging2.dev.applepeak.com</a><br>
&gt; primitive drbd_mysql ocf:linbit:drbd \<br>
&gt;         params drbd_resource=&quot;mysqld&quot; \<br>
&gt;         op monitor interval=&quot;15s&quot; \<br>
&gt;         op start interval=&quot;0&quot; timeout=&quot;240s&quot; \<br>
&gt;         op stop interval=&quot;0&quot; timeout=&quot;100s&quot;<br>
&gt; primitive fs_mysql ocf:heartbeat:Filesystem \<br>
&gt;         params device=&quot;/dev/drbd0&quot; directory=&quot;/opt/data/mysql/data/mysql&quot;<br>
&gt; fstype=&quot;ext4&quot; \<br>
&gt;         op start interval=&quot;0&quot; timeout=&quot;60s&quot; \<br>
&gt;         op stop interval=&quot;0&quot; timeout=&quot;60s&quot;<br>
&gt; primitive ip_mysql ocf:heartbeat:IPaddr2 \<br>
&gt;         params ip=&quot;10.10.10.31&quot; nic=&quot;eth0&quot;<br>
&gt; primitive mysqld lsb:mysqld<br>
&gt; group mysql fs_mysql ip_mysql mysqld<br>
&gt; ms ms_drbd_mysql drbd_mysql \<br>
&gt;         meta master-max=&quot;1&quot; master-node-max=&quot;1&quot; clone-max=&quot;2&quot;<br>
&gt; clone-node-max=&quot;1&quot; notify=&quot;true&quot;<br>
&gt; colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master<br>
&gt; order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start<br>
&gt; property $id=&quot;cib-bootstrap-options&quot; \<br>
&gt;         dc-version=&quot;1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe&quot; \<br>
&gt;         cluster-infrastructure=&quot;openais&quot; \<br>
&gt;         expected-quorum-votes=&quot;2&quot; \<br>
&gt;         stonith-enabled=&quot;false&quot; \<br>
&gt;         last-lrm-refresh=&quot;1316961847&quot; \<br>
&gt;         stop-all-resources=&quot;true&quot; \<br>
&gt;         no-quorum-policy=&quot;ignore&quot;<br>
&gt; rsc_defaults $id=&quot;rsc-options&quot; \<br>
&gt;         resource-stickiness=&quot;100&quot;<br>
&gt;<br>
&gt; Thanks,<br>
&gt; Charles<br>
&gt;<br>
</div></div>&gt; _______________________________________________<br>
&gt; Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
&gt; <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
&gt;<br>
&gt; Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
&gt; Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
&gt; Bugs:<br>
&gt; <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>
&gt;<br>
&gt;<br>
<br>
_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>
</blockquote></div><br>