[Pacemaker] pacemaker dies without logs

Sun Sep 22 07:14:27 UTC 2013

Hi

I have a problem with a cluster where pacemaker dies without logs or something
Problem started when I switched to centos 6.4 and converted cluster from corosync to cman
this happen typically when system is under high load 
tonight I received notification of drbd split brian and found on primary machine only these programs running

 4420 ?        Ss     1:29 /usr/libexec/pacemaker/lrmd
 4422 ?        Ss     0:42 /usr/libexec/pacemaker/pengine

on secondary machine pacemaker is ok
on logs only drbd disconnect and split brain notification
I tried pacemaker 1.1.8 from centos and 1.1.9 and 1.1.10 from clusterlabs with same result

howto debug this problem? 
/etc/sysconfig/pacemaker has lots configuration but not sure which one to use

pacemaker configuration is:

node ga1-ext \
        attributes standby="off"
node ga2-ext \
        attributes standby="off"
primitive ClusterIP ocf:heartbeat:IPaddr \
        params ip="10.12.23.3" cidr_netmask="24" \
        op monitor interval="30s"
primitive SharedFS ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/r0" directory="/shared" fstype="ext4" options="noatime,nobarrier"
primitive dovecot lsb:dovecot
primitive drbd0 ocf:linbit:drbd \
        params drbd_resource="r0" \
        op monitor interval="15s"
primitive drbdlinks ocf:tummy:drbdlinks
primitive mail ocf:heartbeat:MailTo \
        params email="root at company.com" subject="ga-ext cluster - "
primitive mysql lsb:mysqld
group service_group SharedFS drbdlinks ClusterIP mail mysql dovecot \
        meta target-role="Started"
ms ms_drbd0 drbd0 \
        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation service_on_drbd inf: service_group ms_drbd0:Master
order service_after_drbd inf: ms_drbd0:promote service_group:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.10-1.el6-368c726" \
        cluster-infrastructure="cman" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1379831462" \
        maintenance-mode="false"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"

cman configuration

cat /etc/cluster/cluster.conf 

<cluster config_version="6" name="ga-ext_cluster">
  <logging debug="off"/>
  <clusternodes>
    <clusternode name="ga1-ext" nodeid="1">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="ga1-ext"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="ga2-ext" nodeid="2">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="ga2-ext"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice agent="fence_pcmk" name="pcmk"/>
  </fencedevices>
</cluster>

tell me you need other information

thank you