[Pacemaker] Network outage debugging

Sean Lutner sean at rentul.net
Tue Nov 12 14:10:56 EST 2013


The folks testing the cluster I've been building have run a script which blocks all traffic except SSH on one node of the cluster for 15 seconds to mimic a network failure. During this time, the network being "down" seems to cause some odd behavior from pacemaker resulting in it dying.

The cluster is two nodes and running four custom resources on EC2 instances. The OS is CentOS 6.4 with the config below:

I've attached the /var/log/messages and /var/log/cluster/corosync.log from the time period during the test. I've having some difficulty in piecing together what happened and am hoping someone can shed some light on the problem. Any indications why pacemaker is dying on that node?


[root at ip-10-50-3-122 ~]# pcs config
Corosync Nodes:
 
Pacemaker Nodes:
 ip-10-50-3-122 ip-10-50-3-251 

Resources: 
 Resource: ClusterEIP_54.215.143.166 (provider=pacemaker type=EIP class=ocf)
  Attributes: first_network_interface_id=eni-e4e0b68c second_network_interface_id=eni-35f9af5d first_private_ip=10.50.3.191 second_private_ip=10.50.3.91 eip=54.215.143.166 alloc_id=eipalloc-376c3c5f interval=5s 
  Operations: monitor interval=5s
 Clone: EIP-AND-VARNISH-clone
  Group: EIP-AND-VARNISH
   Resource: Varnish (provider=redhat type=varnish.sh class=ocf)
    Operations: monitor interval=5s
   Resource: Varnishlog (provider=redhat type=varnishlog.sh class=ocf)
    Operations: monitor interval=5s
   Resource: Varnishncsa (provider=redhat type=varnishncsa.sh class=ocf)
    Operations: monitor interval=5s
 Resource: ec2-fencing (type=fence_ec2 class=stonith)
  Attributes: ec2-home=/opt/ec2-api-tools pcmk_host_check=static-list pcmk_host_list=HA01 HA02 
  Operations: monitor start-delay=30s interval=0 timeout=150s

Location Constraints:
Ordering Constraints:
  ClusterEIP_54.215.143.166 then Varnish
  Varnish then Varnishlog
  Varnishlog then Varnishncsa
Colocation Constraints:
  Varnish with ClusterEIP_54.215.143.166
  Varnishlog with Varnish
  Varnishncsa with Varnishlog

Cluster Properties:
 dc-version: 1.1.8-7.el6-394e906
 cluster-infrastructure: cman
 last-lrm-refresh: 1384196963
 no-quorum-policy: ignore
 stonith-enabled: true

-------------- next part --------------
A non-text attachment was scrubbed...
Name: net-failure-messages-110913.out
Type: application/octet-stream
Size: 14360 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131112/e9c9317e/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: net-failure-corosync-110913.out
Type: application/octet-stream
Size: 33468 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131112/e9c9317e/attachment-0005.obj>
-------------- next part --------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131112/e9c9317e/attachment-0002.sig>


More information about the Pacemaker mailing list