[Pacemaker] OpenAIS/cman/pacemaker problem
Thorsten Scherf
tscherf at redhat.com
Thu Sep 16 10:14:06 UTC 2010
On [Thu, 16.09.2010 11:21], Andrew Beekhof wrote:
>Technically the subject is incorrect - its a drbd issue.
ack. :)
>Can someone from linbit have a look?
actually this seems to be only a problem with fence_ack_manual. I've
tested with a different fence_device and this worked without problem.
>On Wed, Sep 15, 2010 at 8:43 PM, Thorsten Scherf <tscherf at redhat.com> wrote:
>> Hey,
>>
>> I'm currently trying latest pacemaker RPM on Fedora rawhide together with
>> cman/OpenAIS:
>>
>> cman-3.0.16-1.fc15.i686
>> openais-1.1.4-1.fc15.i686
>> pacemaker-1.1.2-7.fc13.i386 (rebuild from rhel6 beta)
>>
>> I have a very basic cluster.conf (only for testing):
>>
>> # cat /etc/cluster/cluster.conf <?xml version="1.0"?>
>> <cluster name="iscsicluster" config_version="2">
>> <cman two_node="1" expected_votes="1"/>
>> <clusternodes>
>> <clusternode name="iscsi1" votes="1" nodeid="1">
>> <fence>
>> <method name="1">
>> <device name="manual"
>> nodename="iscsi1"/>
>> </method>
>>
>> </fence>
>> </clusternode>
>> <clusternode name="iscsi2" votes="1" nodeid="2">
>> <fence>
>> <method name="1">
>> <device name="manual" nodename="iscsi2"/>
>> </method>
>> </fence>
>> </clusternode>
>> </clusternodes>
>> <fencedevices>
>> <fencedevice agent="fence_manual" name="manual"/>
>> </fencedevices>
>> <rm/>
>> </cluster>
>>
>> pacemaker config looks like this:
>>
>> # crm configure show
>> node iscsi1
>> node iscsi2
>> primitive drbd_disk ocf:linbit:drbd \
>> params drbd_resource="virt_machines" \
>> op monitor interval="15s"
>> primitive ip_drbd ocf:heartbeat:IPaddr2 \
>> params ip="192.168.122.100" cidr_netmask="24" \
>> op monitor interval="10s"
>> primitive iscsi_lsb lsb:tgtd \
>> op monitor interval="10s"
>> group rg_iscsi iscsi_lsb ip_drbd \
>> meta target-role="Started"
>> ms ms_drbd_disk drbd_disk \
>> meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true" target-role="Master"
>> location cli-prefer-rg_iscsi rg_iscsi \
>> rule $id="cli-prefer-rule-rg_iscsi" inf: #uname eq iscsi2
>> colocation c_iscsi_on_drbd inf: rg_iscsi ms_drbd_disk:Master
>> order o_drbd_before_iscsi inf: ms_drbd_disk:promote rg_iscsi:start
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
>> cluster-infrastructure="cman" \
>> stonith-enabled="false" \
>> no-quorum-policy="ignore"
>>
>> this works fine so far:
>>
>> # crm_mon
>> ============
>> Last updated: Wed Sep 15 18:06:42 2010
>> Stack: cman
>> Current DC: iscsi1 - partition with quorum
>> Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
>> 2 Nodes configured, unknown expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ iscsi1 iscsi2 ]
>> Resource Group: rg_iscsi
>> iscsi_lsb (lsb:tgtd): Started iscsi1
>> ip_drbd (ocf::heartbeat:IPaddr2): Started iscsi1
>> Master/Slave Set: ms_drbd_disk
>> Masters: [ iscsi1 ]
>> Slaves: [ iscsi2 ]
>>
>> for testing no fence device is configured. using fence_ack_manual to
>> confirm node shutdown, but that's exactly the problem. when I switch off
>> iscsi1, no resource failover happened after I called fence_ack_manual:
>>
>> /var/log/messages:
>> Sep 15 18:09:02 iscsi2 fenced[1171]: fence iscsi1 failed
>>
>> # fence_ack_manual
>>
>> /var/log/messages:
>> Sep 15 18:09:08 iscsi2 fenced[1171]: fence iscsi1 overridden by
>> administrator intervention
>>
>> # crm_mon:
>> ============
>> Last updated: Wed Sep 15 18:09:26 2010
>> Stack: cman
>> Current DC: iscsi2 - partition with quorum
>> Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
>> 2 Nodes configured, unknown expected votes
>> 2 Resources configured.
>> ============
>>
>> Online: [ iscsi2 ]
>> OFFLINE: [ iscsi1 ]
>>
>> Master/Slave Set: ms_drbd_disk
>> Slaves: [ iscsi2 ]
>> Stopped: [ drbd_disk:0 ]
>>
>> Failed actions:
>> drbd_disk:1_promote_0 (node=iscsi2, call=11, rc=1, status=complete):
>> unknown error
>>
>> # cibadmin -Q is available here:
>> http://pastebin.com/gRUwwVFF
>> Wondering why no service failover happened after I manually confirmed
>> the shutdown of the first node with fence_ack_manual.
>>
>> maybe someone knows what's going on?!
>>
>> Cheers,
>> Thorsten
>>
>>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
More information about the Pacemaker
mailing list