[Pacemaker] OpenAIS/cman/pacemaker problem

Thu Sep 16 10:14:06 UTC 2010

On [Thu, 16.09.2010 11:21], Andrew Beekhof wrote:
>Technically the subject is incorrect - its a drbd issue.

ack. :)

>Can someone from linbit have a look?

actually this seems to be only a problem with fence_ack_manual. I've
tested with a different fence_device and this worked without problem.

>On Wed, Sep 15, 2010 at 8:43 PM, Thorsten Scherf <tscherf at redhat.com> wrote:
>> Hey,
>>
>> I'm currently trying latest pacemaker RPM on Fedora rawhide together with
>> cman/OpenAIS:
>>
>> cman-3.0.16-1.fc15.i686
>> openais-1.1.4-1.fc15.i686
>> pacemaker-1.1.2-7.fc13.i386 (rebuild from rhel6 beta)
>>
>> I have a very basic cluster.conf (only for testing):
>>
>> # cat /etc/cluster/cluster.conf <?xml version="1.0"?>
>> <cluster name="iscsicluster" config_version="2">
>>  <cman two_node="1" expected_votes="1"/>
>>  <clusternodes>
>>    <clusternode name="iscsi1" votes="1" nodeid="1">
>>        <fence>
>>                        <method name="1">
>>                                <device name="manual"
>> nodename="iscsi1"/>
>>                        </method>
>>
>>        </fence>
>>    </clusternode>
>>    <clusternode name="iscsi2" votes="1" nodeid="2">
>>      <fence>
>>                <method name="1">
>>                        <device name="manual" nodename="iscsi2"/>
>>                </method>
>>      </fence>
>>    </clusternode>
>>  </clusternodes>
>>  <fencedevices>
>>        <fencedevice agent="fence_manual" name="manual"/>
>>  </fencedevices>
>>  <rm/>
>> </cluster>
>>
>> pacemaker config looks like this:
>>
>> # crm configure show
>> node iscsi1
>> node iscsi2
>> primitive drbd_disk ocf:linbit:drbd \
>>        params drbd_resource="virt_machines" \
>>        op monitor interval="15s"
>> primitive ip_drbd ocf:heartbeat:IPaddr2 \
>>        params ip="192.168.122.100" cidr_netmask="24" \
>>        op monitor interval="10s"
>> primitive iscsi_lsb lsb:tgtd \
>>        op monitor interval="10s"
>> group rg_iscsi iscsi_lsb ip_drbd \
>>        meta target-role="Started"
>> ms ms_drbd_disk drbd_disk \
>>        meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true" target-role="Master"
>> location cli-prefer-rg_iscsi rg_iscsi \
>>        rule $id="cli-prefer-rule-rg_iscsi" inf: #uname eq iscsi2
>> colocation c_iscsi_on_drbd inf: rg_iscsi ms_drbd_disk:Master
>> order o_drbd_before_iscsi inf: ms_drbd_disk:promote rg_iscsi:start
>> property $id="cib-bootstrap-options" \
>>        dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
>>        cluster-infrastructure="cman" \
>>        stonith-enabled="false" \
>>        no-quorum-policy="ignore"
>>
>> this works fine so far:
>>
>> # crm_mon
>> ============
>> Last updated: Wed Sep 15 18:06:42 2010
>> Stack: cman
>> Current DC: iscsi1 - partition with quorum
>> Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
>> 2 Nodes configured, unknown expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ iscsi1 iscsi2 ]
>> Resource Group: rg_iscsi
>>     iscsi_lsb  (lsb:tgtd):     Started iscsi1
>>     ip_drbd    (ocf::heartbeat:IPaddr2):       Started iscsi1
>>  Master/Slave Set: ms_drbd_disk
>>     Masters: [ iscsi1 ]
>>     Slaves: [ iscsi2 ]
>>
>> for testing no fence device is configured. using fence_ack_manual to
>> confirm node shutdown, but that's exactly the problem. when I switch off
>> iscsi1,  no resource failover happened after I called fence_ack_manual:
>>
>> /var/log/messages:
>> Sep 15 18:09:02 iscsi2 fenced[1171]: fence iscsi1 failed
>>
>> # fence_ack_manual
>>
>> /var/log/messages:
>> Sep 15 18:09:08 iscsi2 fenced[1171]: fence iscsi1 overridden by
>> administrator intervention
>>
>> # crm_mon:
>> ============
>> Last updated: Wed Sep 15 18:09:26 2010
>> Stack: cman
>> Current DC: iscsi2 - partition with quorum
>> Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
>> 2 Nodes configured, unknown expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ iscsi2 ]
>> OFFLINE: [ iscsi1 ]
>>
>>  Master/Slave Set: ms_drbd_disk
>>     Slaves: [ iscsi2 ]
>>     Stopped: [ drbd_disk:0 ]
>>
>> Failed actions:
>>    drbd_disk:1_promote_0 (node=iscsi2, call=11, rc=1, status=complete):
>> unknown error
>>
>> # cibadmin -Q is available here:
>> http://pastebin.com/gRUwwVFF
>>  Wondering why no service failover happened after I manually confirmed
>> the shutdown of the first node with fence_ack_manual.
>>
>> maybe someone knows what's going on?!
>>
>> Cheers,
>> Thorsten
>>
>>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker