[Pacemaker] OpenAIS/cman/pacemaker problem

Andrew Beekhof andrew at beekhof.net
Mon Sep 20 06:29:01 EDT 2010


Thorsten, can you send the crm_report archive to the list please?

I had a look, and as far as I can tell the only reason drbd isn't
being promoted is that it has a promotion score of  -1.
Which is definitely linbit's department :-)

On Thu, Sep 16, 2010 at 12:14 PM, Thorsten Scherf <tscherf at redhat.com> wrote:
> On [Thu, 16.09.2010 11:21], Andrew Beekhof wrote:
>>
>> Technically the subject is incorrect - its a drbd issue.
>
> ack. :)
>
>> Can someone from linbit have a look?
>
> actually this seems to be only a problem with fence_ack_manual. I've
> tested with a different fence_device and this worked without problem.
>
>
>> On Wed, Sep 15, 2010 at 8:43 PM, Thorsten Scherf <tscherf at redhat.com>
>> wrote:
>>>
>>> Hey,
>>>
>>> I'm currently trying latest pacemaker RPM on Fedora rawhide together with
>>> cman/OpenAIS:
>>>
>>> cman-3.0.16-1.fc15.i686
>>> openais-1.1.4-1.fc15.i686
>>> pacemaker-1.1.2-7.fc13.i386 (rebuild from rhel6 beta)
>>>
>>> I have a very basic cluster.conf (only for testing):
>>>
>>> # cat /etc/cluster/cluster.conf <?xml version="1.0"?>
>>> <cluster name="iscsicluster" config_version="2">
>>>  <cman two_node="1" expected_votes="1"/>
>>>  <clusternodes>
>>>    <clusternode name="iscsi1" votes="1" nodeid="1">
>>>        <fence>
>>>                        <method name="1">
>>>                                <device name="manual"
>>> nodename="iscsi1"/>
>>>                        </method>
>>>
>>>        </fence>
>>>    </clusternode>
>>>    <clusternode name="iscsi2" votes="1" nodeid="2">
>>>      <fence>
>>>                <method name="1">
>>>                        <device name="manual" nodename="iscsi2"/>
>>>                </method>
>>>      </fence>
>>>    </clusternode>
>>>  </clusternodes>
>>>  <fencedevices>
>>>        <fencedevice agent="fence_manual" name="manual"/>
>>>  </fencedevices>
>>>  <rm/>
>>> </cluster>
>>>
>>> pacemaker config looks like this:
>>>
>>> # crm configure show
>>> node iscsi1
>>> node iscsi2
>>> primitive drbd_disk ocf:linbit:drbd \
>>>        params drbd_resource="virt_machines" \
>>>        op monitor interval="15s"
>>> primitive ip_drbd ocf:heartbeat:IPaddr2 \
>>>        params ip="192.168.122.100" cidr_netmask="24" \
>>>        op monitor interval="10s"
>>> primitive iscsi_lsb lsb:tgtd \
>>>        op monitor interval="10s"
>>> group rg_iscsi iscsi_lsb ip_drbd \
>>>        meta target-role="Started"
>>> ms ms_drbd_disk drbd_disk \
>>>        meta master-max="1" master-node-max="1" clone-max="2"
>>> clone-node-max="1" notify="true" target-role="Master"
>>> location cli-prefer-rg_iscsi rg_iscsi \
>>>        rule $id="cli-prefer-rule-rg_iscsi" inf: #uname eq iscsi2
>>> colocation c_iscsi_on_drbd inf: rg_iscsi ms_drbd_disk:Master
>>> order o_drbd_before_iscsi inf: ms_drbd_disk:promote rg_iscsi:start
>>> property $id="cib-bootstrap-options" \
>>>        dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
>>>        cluster-infrastructure="cman" \
>>>        stonith-enabled="false" \
>>>        no-quorum-policy="ignore"
>>>
>>> this works fine so far:
>>>
>>> # crm_mon
>>> ============
>>> Last updated: Wed Sep 15 18:06:42 2010
>>> Stack: cman
>>> Current DC: iscsi1 - partition with quorum
>>> Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
>>> 2 Nodes configured, unknown expected votes
>>> 2 Resources configured.
>>> ============
>>>
>>> Online: [ iscsi1 iscsi2 ]
>>> Resource Group: rg_iscsi
>>>     iscsi_lsb  (lsb:tgtd):     Started iscsi1
>>>     ip_drbd    (ocf::heartbeat:IPaddr2):       Started iscsi1
>>>  Master/Slave Set: ms_drbd_disk
>>>     Masters: [ iscsi1 ]
>>>     Slaves: [ iscsi2 ]
>>>
>>> for testing no fence device is configured. using fence_ack_manual to
>>> confirm node shutdown, but that's exactly the problem. when I switch off
>>> iscsi1,  no resource failover happened after I called fence_ack_manual:
>>>
>>> /var/log/messages:
>>> Sep 15 18:09:02 iscsi2 fenced[1171]: fence iscsi1 failed
>>>
>>> # fence_ack_manual
>>>
>>> /var/log/messages:
>>> Sep 15 18:09:08 iscsi2 fenced[1171]: fence iscsi1 overridden by
>>> administrator intervention
>>>
>>> # crm_mon:
>>> ============
>>> Last updated: Wed Sep 15 18:09:26 2010
>>> Stack: cman
>>> Current DC: iscsi2 - partition with quorum
>>> Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
>>> 2 Nodes configured, unknown expected votes
>>> 2 Resources configured.
>>> ============
>>>
>>> Online: [ iscsi2 ]
>>> OFFLINE: [ iscsi1 ]
>>>
>>>  Master/Slave Set: ms_drbd_disk
>>>     Slaves: [ iscsi2 ]
>>>     Stopped: [ drbd_disk:0 ]
>>>
>>> Failed actions:
>>>    drbd_disk:1_promote_0 (node=iscsi2, call=11, rc=1, status=complete):
>>> unknown error
>>>
>>> # cibadmin -Q is available here:
>>> http://pastebin.com/gRUwwVFF
>>>  Wondering why no service failover happened after I manually confirmed
>>> the shutdown of the first node with fence_ack_manual.
>>>
>>> maybe someone knows what's going on?!
>>>
>>> Cheers,
>>> Thorsten
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>




More information about the Pacemaker mailing list