[Pacemaker] node1 fencing itself after node2 being fenced

Asgaroth lists at blueface.com
Wed Feb 5 10:53:13 EST 2014


On 05/02/2014 13:44, Nikita Staroverov wrote:
> Your setup is completely wrong, sorry. You must use RHEL6 
> documentation not RHEL7.
> in short, you should create cman cluster according to RHEL6 docs, but 
> use pacemaker instead of rgmanager and fence_pcmk as fence agent for cman.

Thanks, for the info, however, I am already currently using cman for 
cluster management and pacemaker as the resource manager, this is how I 
created the cluster and it appears to be working ok, please let me know 
if this is not the correct method for CentOS/RHEL 6.5

---
ccs -f /etc/cluster/cluster.conf --createcluster sftp-cluster
ccs -f /etc/cluster/cluster.conf --addnode test01
ccs -f /etc/cluster/cluster.conf --addalt test01 test01-alt
ccs -f /etc/cluster/cluster.conf --addnode test02
ccs -f /etc/cluster/cluster.conf --addalt test02 test02-alt
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect test01
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect test02
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk test01 
pcmk-redirect port=test01
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk test02 
pcmk-redirect port=test02
ccs -f /etc/cluster/cluster.conf --setcman 
keyfile="/etc/corosync/authkey" transport="udpu" port="5405"
ccs -f /etc/cluster/cluster.conf --settotem rrp_mode="active"
sed -i.bak "s/.*CMAN_QUORUM_TIMEOUT=.*/CMAN_QUORUM_TIMEOUT=0/g" 
/etc/sysconfig/cman

pcs stonith create fence_test01 fence_vmware_soap login="user" 
passwd="password" action="reboot" ipaddr="vcenter_host" port="TEST01" 
ssl="1" pcmk_host_list="test01" delay="15"
pcs stonith create fence_test02 fence_vmware_soap login="user" 
passwd="password" action="reboot" ipaddr="vcenter_host" port="TEST02" 
ssl="1" pcmk_host_list="test02"

pcs property set no-quorum-policy="ignore"
pcs property set stonith-enabled="true"
---

The above is taken directly from the pacemaker RHEL 6 2 node cluster 
quick start quide (except for the fence agent definitions).

At this point the cluster comes up and cman_tool sees the two hosts as 
joined and cluster is communicating over the two rings defined. I 
couldnt find the equivilent "pcs" syntax to perform the above 
configuration, looking at the man page of pcs I couldnt track down how 
to, for example, set the security key file using pcs syntax.

The DLM/CLVMD/GFS2 configuration was taken from the RHEL7 documentation 
as it illustrated how to set it up using pcs syntax, the configuration 
commands appear to work fine and the services appear to be configured 
correctly as pacemaker starts services properly, the cluster appears to 
work properly if enable/disable the services using pcs sytax, and, if i 
manually stop/start the pacemaker service, or perform a clean 
shutdown/restart of the second node. The issue comes in when I test a 
crash of the second node, which is where I find the particular issue 
with fencing.

Reading some archives of this mailing list there seem to be suggestions 
that dlm may be waiting on pacemaker to fence a node, which then cause a 
temporary "freeze" of the clvmd/gfs2 configuration, I underatand this is 
by design. However, when I test the 2nd node hand by doing a "echo c > 
/proc/sysrq-trigger", then i can see that stonithd begins fencing 
procedures around node2, att his point according to crm_mon the dlm 
service is stopped on node2 and started on node1, clvmd then goes in to 
a failed state, I presume, because of a possible timeout (I could be 
wrong), or, potentially, because it cannot communicate with clvmd on 
node2. When clvmd goes in to a failed state, this is when stonithd 
attempts to fence node1, and it does it successfully by shutting it down.

Some archive messages seem to suggest that clvmd should be started 
outside of the cluster at system boot (cman -> clvmd -> pacemaker), 
however, my personal preference would be to have these services managed 
by the cluster infrastructure, which is why I am attempting to set it up 
in this manner.

Is there anyone else out there that may be running a similar 
configuration dlm/clvmd/[gfs/gfs2/ocfs] under pacemaker control?

Again, thanks for the info, I will do some more reading to ensure that I 
am using the correct syntax for pcs to configure these services.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140205/3b198744/attachment-0003.html>


More information about the Pacemaker mailing list