[Pacemaker] Network outage debugging
Sean Lutner
sean at rentul.net
Tue Nov 12 19:10:56 UTC 2013
The folks testing the cluster I've been building have run a script which blocks all traffic except SSH on one node of the cluster for 15 seconds to mimic a network failure. During this time, the network being "down" seems to cause some odd behavior from pacemaker resulting in it dying.
The cluster is two nodes and running four custom resources on EC2 instances. The OS is CentOS 6.4 with the config below:
I've attached the /var/log/messages and /var/log/cluster/corosync.log from the time period during the test. I've having some difficulty in piecing together what happened and am hoping someone can shed some light on the problem. Any indications why pacemaker is dying on that node?
[root at ip-10-50-3-122 ~]# pcs config
Corosync Nodes:
Pacemaker Nodes:
ip-10-50-3-122 ip-10-50-3-251
Resources:
Resource: ClusterEIP_54.215.143.166 (provider=pacemaker type=EIP class=ocf)
Attributes: first_network_interface_id=eni-e4e0b68c second_network_interface_id=eni-35f9af5d first_private_ip=10.50.3.191 second_private_ip=10.50.3.91 eip=54.215.143.166 alloc_id=eipalloc-376c3c5f interval=5s
Operations: monitor interval=5s
Clone: EIP-AND-VARNISH-clone
Group: EIP-AND-VARNISH
Resource: Varnish (provider=redhat type=varnish.sh class=ocf)
Operations: monitor interval=5s
Resource: Varnishlog (provider=redhat type=varnishlog.sh class=ocf)
Operations: monitor interval=5s
Resource: Varnishncsa (provider=redhat type=varnishncsa.sh class=ocf)
Operations: monitor interval=5s
Resource: ec2-fencing (type=fence_ec2 class=stonith)
Attributes: ec2-home=/opt/ec2-api-tools pcmk_host_check=static-list pcmk_host_list=HA01 HA02
Operations: monitor start-delay=30s interval=0 timeout=150s
Location Constraints:
Ordering Constraints:
ClusterEIP_54.215.143.166 then Varnish
Varnish then Varnishlog
Varnishlog then Varnishncsa
Colocation Constraints:
Varnish with ClusterEIP_54.215.143.166
Varnishlog with Varnish
Varnishncsa with Varnishlog
Cluster Properties:
dc-version: 1.1.8-7.el6-394e906
cluster-infrastructure: cman
last-lrm-refresh: 1384196963
no-quorum-policy: ignore
stonith-enabled: true
-------------- next part --------------
A non-text attachment was scrubbed...
Name: net-failure-messages-110913.out
Type: application/octet-stream
Size: 14360 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131112/e9c9317e/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: net-failure-corosync-110913.out
Type: application/octet-stream
Size: 33468 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131112/e9c9317e/attachment-0007.obj>
-------------- next part --------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131112/e9c9317e/attachment-0003.sig>
More information about the Pacemaker
mailing list