[Pacemaker] node1 fencing itself after node2 being fenced
Asgaroth
lists at blueface.com
Wed Feb 5 15:53:13 UTC 2014
On 05/02/2014 13:44, Nikita Staroverov wrote:
> Your setup is completely wrong, sorry. You must use RHEL6
> documentation not RHEL7.
> in short, you should create cman cluster according to RHEL6 docs, but
> use pacemaker instead of rgmanager and fence_pcmk as fence agent for cman.
Thanks, for the info, however, I am already currently using cman for
cluster management and pacemaker as the resource manager, this is how I
created the cluster and it appears to be working ok, please let me know
if this is not the correct method for CentOS/RHEL 6.5
---
ccs -f /etc/cluster/cluster.conf --createcluster sftp-cluster
ccs -f /etc/cluster/cluster.conf --addnode test01
ccs -f /etc/cluster/cluster.conf --addalt test01 test01-alt
ccs -f /etc/cluster/cluster.conf --addnode test02
ccs -f /etc/cluster/cluster.conf --addalt test02 test02-alt
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect test01
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect test02
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk test01
pcmk-redirect port=test01
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk test02
pcmk-redirect port=test02
ccs -f /etc/cluster/cluster.conf --setcman
keyfile="/etc/corosync/authkey" transport="udpu" port="5405"
ccs -f /etc/cluster/cluster.conf --settotem rrp_mode="active"
sed -i.bak "s/.*CMAN_QUORUM_TIMEOUT=.*/CMAN_QUORUM_TIMEOUT=0/g"
/etc/sysconfig/cman
pcs stonith create fence_test01 fence_vmware_soap login="user"
passwd="password" action="reboot" ipaddr="vcenter_host" port="TEST01"
ssl="1" pcmk_host_list="test01" delay="15"
pcs stonith create fence_test02 fence_vmware_soap login="user"
passwd="password" action="reboot" ipaddr="vcenter_host" port="TEST02"
ssl="1" pcmk_host_list="test02"
pcs property set no-quorum-policy="ignore"
pcs property set stonith-enabled="true"
---
The above is taken directly from the pacemaker RHEL 6 2 node cluster
quick start quide (except for the fence agent definitions).
At this point the cluster comes up and cman_tool sees the two hosts as
joined and cluster is communicating over the two rings defined. I
couldnt find the equivilent "pcs" syntax to perform the above
configuration, looking at the man page of pcs I couldnt track down how
to, for example, set the security key file using pcs syntax.
The DLM/CLVMD/GFS2 configuration was taken from the RHEL7 documentation
as it illustrated how to set it up using pcs syntax, the configuration
commands appear to work fine and the services appear to be configured
correctly as pacemaker starts services properly, the cluster appears to
work properly if enable/disable the services using pcs sytax, and, if i
manually stop/start the pacemaker service, or perform a clean
shutdown/restart of the second node. The issue comes in when I test a
crash of the second node, which is where I find the particular issue
with fencing.
Reading some archives of this mailing list there seem to be suggestions
that dlm may be waiting on pacemaker to fence a node, which then cause a
temporary "freeze" of the clvmd/gfs2 configuration, I underatand this is
by design. However, when I test the 2nd node hand by doing a "echo c >
/proc/sysrq-trigger", then i can see that stonithd begins fencing
procedures around node2, att his point according to crm_mon the dlm
service is stopped on node2 and started on node1, clvmd then goes in to
a failed state, I presume, because of a possible timeout (I could be
wrong), or, potentially, because it cannot communicate with clvmd on
node2. When clvmd goes in to a failed state, this is when stonithd
attempts to fence node1, and it does it successfully by shutting it down.
Some archive messages seem to suggest that clvmd should be started
outside of the cluster at system boot (cman -> clvmd -> pacemaker),
however, my personal preference would be to have these services managed
by the cluster infrastructure, which is why I am attempting to set it up
in this manner.
Is there anyone else out there that may be running a similar
configuration dlm/clvmd/[gfs/gfs2/ocfs] under pacemaker control?
Again, thanks for the info, I will do some more reading to ensure that I
am using the correct syntax for pcs to configure these services.
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140205/3b198744/attachment.htm>
More information about the Pacemaker
mailing list