<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Times New Roman; font-size: 12pt; color: #000000'>Andrew,<br><br>The pacemaker package built by Ubuntu requires the following dependencies (besides corosync, resource-agents, and cluster-glue):<br>*libccs3 libcib1 *libcman3 libcrmcluster1 libcrmcommon2 *libesmtp6 *libfence4 libpe-rules2 libpe-status3 libpengine3 libstonithd1 libtransitioner1 <br><br>It appears that compiling pacemaker from source includes all of these dependencies except those marked with an asterisk above. I can install libesmtp6 and libcman3 without problem, but libfence4 (and dependency libccs3) require libconfdb4 and libcoroipcc4, which are now both part of the corosync 1.4.4 package that I compiled. Do I also need to build libccs3 and libfence4, or are these libraries deprecated in pacemaker 1.1.8 (I don't see them listed on https://github.com/ClusterLabs/pacemaker/blob/master/README.markdown)?<br><br>Similarly, the Ubuntu corosync package requires the following: libcfg4 libconfdb4 libcoroipcc4 libcoroipcs4 libcpg4 libevs4 liblogsys4 libpload4 libquorum4 libsam4 libtotem-pg4 libvotequorum4, however all of these appear to be built into corosync when compiled from source.<br><br>Thanks,<br><br>Andrew<br><br><hr id="zwchr"><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Andrew Beekhof" <andrew@beekhof.net><br><b>To: </b>"The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org><br><b>Sent: </b>Monday, October 15, 2012 5:31:51 AM<br><b>Subject: </b>Re: [Pacemaker] STONITHed node cannot rejoin cluster for over 1000 elections<br><br>On Sat, Oct 13, 2012 at 1:53 AM, Andrew Martin <amartin@xes-inc.com> wrote:<br>> Hi Andrew,<br>><br>> Thanks, I'll compile Pacemaker 1.1.8 and Corosync 1.4.4. Can I leave<br>> cluster-glue and resource-agents at the default versions provided with<br>> Ubuntu 12.04 (1.0.8 and 3.9.2 respectively), or do I need to upgrade them as<br>> well?<br><br>Should be fine. You will need to obtain a recent libqb build though.<br><br>><br>> Andrew<br>><br>> ________________________________<br>> From: "Andrew Beekhof" <andrew@beekhof.net><br>> To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org><br>> Sent: Thursday, October 11, 2012 8:08:13 PM<br>> Subject: Re: [Pacemaker] STONITHed node cannot rejoin cluster for over 1000<br>> elections<br>><br>><br>> On Fri, Oct 12, 2012 at 7:12 AM, Andrew Martin <amartin@xes-inc.com> wrote:<br>>> Hello,<br>>><br>>><br>>> I am running a 3-node Corosync+Pacemaker cluster with 2 "real" nodes<br>>> running resources (storage0 and storage1) and a quorum node (storagequorum)<br>>> in standby mode. All of the nodes run Ubuntu 12.04 server amd64. There are<br>>> two corosync rings:<br>>> rrp_mode: active<br>>><br>>> interface {<br>>> # the common LAN<br>>> ringnumber: 0<br>>> bindnetaddr: 10.10.0.0<br>>> mcastaddr: 226.94.1.1<br>>> mcastport: 5405<br>>> }<br>>><br>>> interface {<br>>> # the STONITH network<br>>> ringnumber: 1<br>>> bindnetaddr: 192.168.7.0<br>>> mcastaddr: 226.94.1.2<br>>> mcastport: 5407<br>>> }<br>>><br>>> DRBD is configured to use /usr/lib/drbd/crm-fence-peer.sh to fence the<br>>> peer node.<br>>><br>>> There are 3 active interfaces on storage[01]: the common LAN, the STONITH<br>>> network, and the DRBD replication link. The storagequorum node only has the<br>>> common LAN and STONITH networks. When looking through the logs, note that<br>>> the IP addresses for each node are assigned as follows:<br>>><br>>> storage0: xxx.xxx.xxx.148<br>>> storage1: xxx.xxx.xxx.149<br>>> storagequorum: xxx.xxx.xxx.24<br>>><br>>> Storage0 and storage1 also had a secondary link to the common LAN which<br>>> has now been disabled (xxx.xxx.xxx.162 and xxx.xxx.xxx.163 respectively).<br>>> You still may see it show up in the log, e.g.<br>>> Oct 5 22:17:39 storagequorum crmd: [7873]: info: crm_update_peer: Node<br>>> storage1: id=587281418 state=lost addr=r(0) ip(10.10.1.163) r(1)<br>>> ip(192.168.7.149) votes=1 born=1828352 seen=1828368<br>>> proc=00000000000000000000000000111312 (new)<br>>><br>>> Here is the CIB configuration:<br>>> http://pastebin.com/6TPkWtbt<br>>><br>>> As you can see, the drbd-fence-by-handler-ms_drbd_drives primitive keeps<br>>> getting added into the configuration but doesn't seem to get removed.<br>>><br>>><br>>> I recently tried running a failover test by performing "crm resource<br>>> migrate g_store" when the resources were running on storage1. The<br>>> ocf:heartbeat:exportfs resources failed to stop due to<br>>> wait_for_leasetime_on_stop being true (I am going to set this to false now<br>>> because I don't need NFSv4 support). Recognizing this problem, the cluster<br>>> correctly STONITHed storage1 and migrated the resources to storage0.<br>>> However, once storage1 finished rebooting, it was unable to join the cluster<br>>> (crm_mon shows it as [offline]). I have uploaded the syslog from the DC<br>>> (storagequorum) from this time period here:<br>>> http://sources.xes-inc.com/downloads/storagequorum.syslog.log . Initially<br>>> after the STONITH it seems like storage1 rejoins the cluster successfully:<br>>> Oct 5 22:17:39 storagequorum cib: [7869]: info: crm_update_peer: Node<br>>> storage1: id=352400394 state=member (new) addr=r(0) ip(10.10.1.149) r(1)<br>>> ip(192.168.7.149) (new) votes=1 born=1828384 seen=1828384<br>>> proc=00000000000000000000000000111312<br>>><br>>> However, later it becomes apparent that it cannot join:<br>>> Oct 5 22:17:58 storagequorum crmd: [7873]: notice:<br>>> do_election_count_vote: Election 15 (current: 15, owner: storagequorum):<br>>> Processed no-vote from storage1 (Peer is not part of our cluster)<br>>> ....<br>>> Oct 6 03:49:58 storagequorum crmd: [18566]: notice:<br>>> do_election_count_vote: Election 989 (current: 1, owner: storage1):<br>>> Processed vote from storage1 (Peer is not part of our cluster)<br>>><br>>> Around 1000 election cycles occur before storage1 is brought back into the<br>>> cluster. What is the cause of this and how can I modify my cluster<br>>> configuration to have nodes rejoin right away?<br>><br>> Its not a configuration issue, you're hitting one or more bugs.<br>><br>> You seem to be using 1.1.6, can I suggest an upgrade to 1.1.8? I<br>> recall fixing related issues in the last month or so.<br>> Also consider an updated corosync, there were some related fixes there too.<br>><br>>><br>>><br>>> Thanks,<br>>><br>>> Andrew Martin<br>>><br>>><br>>> _______________________________________________<br>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br>>><br>>> Project Home: http://www.clusterlabs.org<br>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>>> Bugs: http://bugs.clusterlabs.org<br>><br>> _______________________________________________<br>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br>><br>> Project Home: http://www.clusterlabs.org<br>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>> Bugs: http://bugs.clusterlabs.org<br>><br>><br>> _______________________________________________<br>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br>><br>> Project Home: http://www.clusterlabs.org<br>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>> Bugs: http://bugs.clusterlabs.org<br>><br><br>_______________________________________________<br>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br><br>Project Home: http://www.clusterlabs.org<br>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>Bugs: http://bugs.clusterlabs.org<br></div><br></div></body></html>