[Pacemaker] OCFS2 fencing regulated by Pacemaker?

Thu Feb 11 05:11:33 EST 2010

Hello.

Yes, we get the same kind of thing. SLES11 HAE 64-bit.

Average uptime of the boxes is about a week at the moment. Also 3 nodes
using OCFS2 / cLVMD / OCFS2:

node OGG-NODE-01                    
node OGG-NODE-02 \                  
        attributes standby="off"           
node OGG-NODE-03                    
primitive STONITH-1 stonith:external/ibmrsa-telnet \
        params nodename="OGG-NODE-01" ip_address="192.168.1.12"
password="PASSWORD" username="USERID" \
        op monitor interval="1h" timeout="1m" \

        op startup interval="0" timeout="1m" \

        meta target-role="Started"

primitive STONITH-2 stonith:external/ibmrsa-telnet \

        params nodename="OGG-NODE-02" ip_address="192.168.1.22"
password="PASSWORD" username="USERID" \
        op monitor interval="1h" timeout="1m" \

        op startup interval="0" timeout="1m" \

        meta target-role="Started"

primitive STONITH-3 stonith:external/ibmrsa-telnet \

        params nodename="OGG-NODE-03" ip_address="192.168.1.32"
password="PASSWORD" username="USERID" \                           
        op monitor interval="1h" timeout="1m" \

        meta target-role="Started"

primitive Virtual-IP-App1 ocf:heartbeat:IPaddr2 \

        params lvs_support="true" ip="192.168.1.100" cidr_netmask="24"
broadcast="192.168.1.255" \                                       
        op monitor interval="1m" timeout="10s" \

        meta migration-threshold="10"

primitive Virtual-IP-App2 ocf:heartbeat:IPaddr2 \

        params lvs_support="true" ip="192.168.1.103" cidr_netmask="24"
broadcast="192.168.1.255" \                                       
        op monitor interval="1m" timeout="10s" \

        meta migration-threshold="10"

primitive ldirectord ocf:heartbeat:ldirectord \

        params configfile="/etc/ha.d/ldirectord.cf" \

        op monitor interval="2m" timeout="20s" \
        meta migration-threshold="10" target-role="Started"
primitive App1 lsb:App1 \
        op monitor interval="10s" enabled="true" timeout="10s" \
        meta target-role="Started"
primitive App2 lsb:App2 \
        op monitor interval="10s" enabled="true" timeout="10s" \
        meta target-role="Started"
primitive dlm ocf:pacemaker:controld \
        op monitor interval="120s"
primitive o2cb ocf:ocfs2:o2cb \
        op monitor interval="2m"
primitive fs ocf:heartbeat:Filesystem \
        params device="/dev/dm-0" directory="/opt/SAN/" fstype="ocfs2" \
        op monitor interval="120s"
group Load-Balancing Virtual-IP-App1 Virtual-IP-App2 ldirectord
clone cl-App1 App1
clone cl-App2 App2
clone dlm-clone dlm \
        meta globally-unique="false" interleave="true"
target-role="Started"
clone o2cb-clone o2cb \
        meta globally-unique="false" interleave="true"
target-role="Started"
clone fs-clone fs \
        meta interleave="true" ordered="true" target-role="Started"
location l-st-1 STONITH-1 -inf: OGG-NODE-01
location l-st-2 STONITH-2 -inf: OGG-NODE-02
location l-st-3 STONITH-3 -inf: OGG-NODE-03
location Prefer-Node1 ldirectord \
        rule $id="prefer-node1-rule" 100: #uname eq OGG-NODE-01
colocation o2cb-with-dlm inf: o2cb-clone dlm-clone
colocation fs-with-o2cb inf: fs-clone o2cb-clone
order start-o2cb-after-dlm inf: dlm-clone o2cb-clone
order start-fs-after-o2cb inf: o2cb-clone fs-clone
order start-app1-after-fs inf: fs-clone cl-App1
order start-app2-after-fs inf: fs-clone cl-App2
property $id="cib-bootstrap-options" \
        dc-version="1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a" \
        expected-quorum-votes="3" \
        no-quorum-policy="ignore" \
        start-failure-is-fatal="false" \
        stonith-action="reboot" \
        last-lrm-refresh="1265882628" \
        stonith-enabled="true"

We seem to have randomly picked up a standby="off" node attribute, I
can't see it's causing any problems but I'm too afraid to make any
changes at the moment in case we have a(nother) shootout.

-----Original Message-----
From: Sander van Vugt [mailto:mail at sandervanvugt.nl] 
Sent: 11 February 2010 08:30
To: pacemaker at clusterlabs.org
Subject: [Pacemaker] OCFS2 fencing regulated by Pacemaker?

Hi,

I'm trying to set up OCFS2 in a pacemaker environment (SLES11 with HAE),
in a 3 node cluster. Now I succesfully configured two volumes, the dlm
and the o2cb resource. But: if I shut down one of the nodes, at least
one other node (and sometimes even two!) is fencing itself. 

I've been looking for the a way to control this behavior, but can't find
anything. 

Does anyone have a clue?
Thanks,
Sander

_______________________________________________
Pacemaker mailing list
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker