<div dir="ltr">Hi,<div><br></div><div>Sorry, I have managed to fix this now. I noticed in the logline:</div><div><br></div><div>A<span style="font-family:arial,sans-serif;font-size:13px">ug 6 13:26:23 ldb03 cibadmin[2140]: notice: crm_log_args: Invoked:</span></div>
<span style="font-family:arial,sans-serif;font-size:13px">cibadmin -M -c -o status --xml-text <node_state id="ldb03" uname="ldb03"</span><br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">ha="active" in_ccm="false" crmd="offline" join="member" expected="down" crm-</span><br style="font-family:arial,sans-serif;font-size:13px">
<span style="font-family:arial,sans-serif;font-size:13px">debug-origin="manual_clear" shutdown="0"/></span><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13px">the id is ldb03, not the ID of the node, 12303.</span></div>
<div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13px">I removed using: crm_node -R "ldb03" --force</span></div><div>and rebooted.</div>
<div><br></div><div>Nodes are now in sync.</div><div><br></div><div>Thanks,</div><div><br></div><div>Jamie.</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Aug 6, 2014 at 2:43 PM, Jamie <span dir="ltr"><<a href="mailto:thisbodydrop@gmail.com" target="_blank">thisbodydrop@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
I have setup a 2 node cluster, using the following packages:<br>
<br>
pacemaker 1.1.10+git20130802-1ubuntu2<br>
corosync 2.3.3-1ubuntu1<br>
<br>
My cluster config is as so:<br>
<br>
node $id="12303" ldb03<br>
node $id="12304" ldb04<br>
primitive p_fence_ldb03 stonith:external/vcenter \<br>
params VI_SERVER="10.17.248.10"<br>
VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml"<br>
HOSTLIST="ldb03=ldb03" RESETPOWERON="0" pcmk_host_check="static-list"<br>
pcmk_host_list="ldb03" \<br>
op start interval="0" timeout="500s"<br>
primitive p_fence_ldb04 stonith:external/vcenter \<br>
params VI_SERVER="10.17.248.10"<br>
VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml"<br>
HOSTLIST="ldb04=ldb04" RESETPOWERON="0" pcmk_host_check="static-list"<br>
pcmk_host_list="ldb04" \<br>
op start interval="0" timeout="500s"<br>
primitive p_fs_mysql ocf:heartbeat:Filesystem \<br>
params device="nfsserver:/LDB_Cluster1" directory="/var/lib/mysql"<br>
fstype="nfs"<br>
options="relatime,rw,hard,nointr,rsize=32768,wsize=32768,bg,vers=3,proto=tcp<br>
" \<br>
op start interval="0" timeout="60s" \<br>
op stop interval="0" timeout="120s" \<br>
op monitor interval="60s" timeout="60s" \<br>
meta is-managed="true"<br>
primitive p_ip_1 ocf:heartbeat:IPaddr2 \<br>
params ip="10.10.10.11" cidr_netmask="25" \<br>
op monitor interval="30s" \<br>
meta target-role="Started" is-managed="true"<br>
primitive p_ip_2 ocf:heartbeat:IPaddr2 \<br>
params ip="10.10.10.12" cidr_netmask="25" \<br>
op monitor interval="30s" \<br>
meta target-role="Started" is-managed="true"<br>
primitive p_ip_3 ocf:heartbeat:IPaddr2 \<br>
params ip="10.10.10.13" cidr_netmask="25" \<br>
op monitor interval="30s" \<br>
meta target-role="Started" is-managed="true"<br>
primitive p_mysql ocf:heartbeat:mysql \<br>
params datadir="/var/lib/mysql" binary="/usr/bin/mysqld_safe"<br>
socket="/var/run/mysqld/mysqld.sock" \<br>
op start interval="0" timeout="120" \<br>
op stop interval="0" timeout="120" \<br>
op monitor interval="20" timeout="30" \<br>
meta target-role="Started" is-managed="true"<br>
group g_mysql p_fs_mysql p_mysql p_ip_1 p_ip_2 p_ip_3 \<br>
location l_fence_ldb03 p_fence_ldb03 -inf: ldb03<br>
location l_fence_ldb04 p_fence_ldb04 -inf: ldb04<br>
property $id="cib-bootstrap-options" \<br>
dc-version="1.1.10-42f2063" \<br>
cluster-infrastructure="corosync" \<br>
no-quorum-policy="ignore" \<br>
stonith-enabled="true" \<br>
stop-all-resources="false" \<br>
expected-quorum-votes="2" \<br>
last-lrm-refresh="1407325251"<br>
<br>
<br>
This exact configuration has worked during the setup, but I have encountered<br>
a problem with my inactive node ldb03. Corosync shows this node as up:<br>
<br>
root@ldb03:~# corosync-cmapctl | grep members<br>
runtime.totem.pg.mrp.srp.members.12303.config_version (u64) = 0<br>
runtime.totem.pg.mrp.srp.members.12303.ip (str) = r(0) ip(10.10.10.8)<br>
runtime.totem.pg.mrp.srp.members.12303.join_count (u32) = 1<br>
runtime.totem.pg.mrp.srp.members.12303.status (str) = joined<br>
runtime.totem.pg.mrp.srp.members.12304.config_version (u64) = 0<br>
runtime.totem.pg.mrp.srp.members.12304.ip (str) = r(0) ip(10.10.10.9)<br>
runtime.totem.pg.mrp.srp.members.12304.join_count (u32) = 1<br>
runtime.totem.pg.mrp.srp.members.12304.status (str) = joined<br>
<br>
and crm status and crm node status show it as online:<br>
<br>
Last updated: Wed Aug 6 14:16:24 2014<br>
Last change: Wed Aug 6 14:02:00 2014 via crm_resource on ldb04<br>
Stack: corosync<br>
Current DC: ldb04 (12304) - partition with quorum<br>
Version: 1.1.10-42f2063<br>
2 Nodes configured<br>
7 Resources configured<br>
Online: [ ldb03 ldb04 ]<br>
<br>
root@ldb03:~# crm node status<br>
<nodes><br>
<node id="12304" uname="ldb04"/><br>
<node id="12303" uname="ldb03"/><br>
</nodes><br>
<br>
<br>
but....after seeing this entry in my logs:<br>
Aug 6 13:26:23 ldb03 cibadmin[2140]: notice: crm_log_args: Invoked:<br>
cibadmin -M -c -o status --xml-text <node_state id="ldb03" uname="ldb03"<br>
ha="active" in_ccm="false" crmd="offline" join="member" expected="down" crm-<br>
debug-origin="manual_clear" shutdown="0"/><br>
<br>
I noticed that cibadmin shows it as normal(offline)<br>
root@ldb03:~# crm node show<br>
ldb04(12304): normal<br>
ldb03(12303): normal(offline)<br>
<br>
The offline state is not present in anything but cibadmin. Not the cib.xml,<br>
not corosync-quorumtool and a tcpdump shows multicast traffic from both<br>
hosts.<br>
<br>
I tried (hesitantly) to delete the line using cibadmin, but I couldn't quite<br>
get the syntax right. Any tips on how to get this node to show as online and<br>
subsequently be able to run resources? Currently, when I run crm resource<br>
move, this has no effect, no errors and nothing noticeable in the logfiles<br>
either.<br>
<br>
Sorry for long thread....I can attach more logs/config if necessary.<br>
<br>
Thanks,<br>
<br>
Jamie.<br>
<br>
<br>
_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
</blockquote></div><br></div>