[Pacemaker] colocation conundrum

Sun Nov 25 23:24:07 UTC 2012

On Wed, Nov 21, 2012 at 5:56 AM, Craig Donnelly <craig at goaf.net> wrote:
> Hi there,
>
> I think Ive exhausted everything I can find online in terms of trying to solve my problem so here goes with a posting to see if anyone on this mailing list might be able to help please.
>
> I have a pacemaker1.1.7/corosync 1.4.1  two node cluster running on CentOS 6.3.
> Im using this cluster to support shared storage using a combination of LVM and iSCSI.
>
> Now failover works fine if I offline/stonith a node. However when I bring the node back online they enter a death-match situation.
> I see the issue as being with ordering/colocation/resource sets

Either its not a death-match, or the constraints are not the problem.

Most likely (from what you've described), the the constraints are
causing resource "issues" - including the inability to stop.
This then leads to node fencing (the only way to clean up resource
stop failures).

Looking at your config, Jake's advice looks pretty good.
Use ordering constraints to make sure things don't get started until
everything they need is available.

> and I have tried a bunch of different variations and read and re-read all the information I can find online without resolution.
>
> Would really appreciate any help/advise.
>
> The key entries that I can see in the logs are:
>
> NODE1:
> ======
> Nov 20 12:16:38 cs1san1 iSCSILogicalUnit(cs1lb1l1)[2710]: ERROR: tgtadm: invalid request
> Nov 20 12:16:39 cs1san1 iSCSILogicalUnit(cs1man1l1)[2807]: ERROR: tgtadm: invalid request
> Nov 20 12:22:55 cs1san1 iSCSILogicalUnit(cs1master1l1)[4482]: ERROR: tgtadm: invalid request
> Nov 20 12:23:17 cs1san1 iSCSILogicalUnit(cs1ddb1l1)[4968]: ERROR: tgtadm: invalid request
> Nov 20 12:23:18 cs1san1 iSCSILogicalUnit(cs1master1l1)[5081]: ERROR: tgtadm: invalid request
> Nov 20 12:30:28 cs1san1 iSCSILogicalUnit(cs1lb1l1)[2670]: ERROR: tgtadm: invalid request
>
> NODE2:
> ======
> Nov 20 12:16:38 cs1san2 LVM(cs1vg1)[22039]: ERROR: Can't deactivate volume group "cs1vg1" with 3 open logical volume(s)
> Nov 20 12:22:55 cs1san2 LVM(cs1vg1)[3386]: ERROR: Can't deactivate volume group "cs1vg1" with 2 open logical volume(s)
> Nov 20 12:23:17 cs1san2 LVM(cs1vg1)[4296]: ERROR: Can't deactivate volume group "cs1vg1" with 1 open logical volume(s)
> Nov 20 12:30:27 cs1san2 LVM(cs1vg1)[14943]: ERROR: Can't deactivate volume group "cs1vg1" with 4 open logical volume(s)
>
> which, to me, clearly indicates an ordering issue yet the configuration I have follows the colocation/ordering rules in as much as I can understand them.
>
> My "current" CRM config is as follows:
> ==============================================================================
> node cs1san1 \
>         attributes standby="off"
> node cs1san2 \
>         attributes standby="off"
> primitive alert ocf:heartbeat:MailTo \
>         params email="ops at xyz.com" subject="CS takeover event" \
>         op monitor interval="10s"
> primitive cs1ddb1l1 ocf:heartbeat:iSCSILogicalUnit \
>         params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1ddb1d1" lun="1" path="/dev/cs1vg1/cs1ddb1d1" \
>         op monitor interval="10" timeout="15"
> primitive cs1ddb1t1 ocf:heartbeat:iSCSITarget \
>         params iqn="iqn.2012-10.com.xyz.cs1san1:cs1ddb1d1" tid="7" \
>         op monitor interval="10" timeout="15"
> primitive cs1dws1l1 ocf:heartbeat:iSCSILogicalUnit \
>         params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1dws1d1" lun="1" path="/dev/cs1vg1/cs1dws1d1" \
>         op monitor interval="10" timeout="15"
> primitive cs1dws1t1 ocf:heartbeat:iSCSITarget \
>         params iqn="iqn.2012-10.com.xyz.cs1san1:cs1dws1d1" tid="8" \
>         op monitor interval="10" timeout="15"
> primitive cs1lb1l1 ocf:heartbeat:iSCSILogicalUnit \
>         params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1lb1d1" lun="1" path="/dev/cs1vg1/cs1lb1d1" \
>         op start interval="0" timeout="15" \
>         op stop interval="0" timeout="15" \
>         op monitor interval="10" timeout="15" \
>         meta is-managed="true"
> primitive cs1lb1t1 ocf:heartbeat:iSCSITarget \
>         params iqn="iqn.2012-10.com.xyz.cs1san1:cs1lb1d1" tid="1" \
>         op monitor interval="10" timeout="15"
> primitive cs1lb2l1 ocf:heartbeat:iSCSILogicalUnit \
>         params target_iqn="iqn.2012-10.com.xyz.cs1san2:cs1lb2d1" lun="1" path="/dev/cs1vg2/cs1lb2d1" \
>         op start interval="0" timeout="15" \
>         op stop interval="0" timeout="15" \
>         op monitor interval="10" timeout="15"
> primitive cs1lb2t1 ocf:heartbeat:iSCSITarget \
>         params iqn="iqn.2012-10.com.xyz.cs1san2:cs1lb2d1" tid="2" \
>         op monitor interval="10" timeout="15"
> primitive cs1man1l1 ocf:heartbeat:iSCSILogicalUnit \
>         params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1man1d1" lun="1" path="/dev/cs1vg1/cs1man1d1" \
>         op monitor interval="10" timeout="15"
> primitive cs1man1t1 ocf:heartbeat:iSCSITarget \
>         params iqn="iqn.2012-10.com.xyz.cs1san1:cs1man1d1" tid="5" \
>         op monitor interval="10" timeout="15"
> primitive cs1master1l1 ocf:heartbeat:iSCSILogicalUnit \
>         params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1master1d1" lun="1" path="/dev/cs1vg1/cs1master1d1" \
>         op monitor interval="10" timeout="15"
> primitive cs1master1t1 ocf:heartbeat:iSCSITarget \
>         params iqn="iqn.2012-10.com.xyz.cs1san1:cs1master1d1" tid="6" \
>         op monitor interval="10" timeout="15"
> primitive cs1vg1 ocf:heartbeat:LVM \
>         params exclusive="true" volgrpname="cs1vg1" \
>         op start interval="0" timeout="30s" \
>         op stop interval="0" timeout="30s" \
>         meta target-role="Started"
> primitive cs1vg2 ocf:heartbeat:LVM \
>         params exclusive="true" volgrpname="cs1vg2" \
>         op start interval="0" timeout="30s" \
>         op stop interval="0" timeout="30s" \
>         meta target-role="Started"
> primitive ping ocf:pacemaker:ping \
>         params host_list="10.96.0.1 10.96.0.2" attempts="3" timeout="2s" multiplier="100" dampen="5s" \
>         op monitor interval="10s"
> primitive san1fencer stonith:fence_ipmilan \
>         params pcmk_host_list="cs1san1" lanplus="1" ipaddr="10.96.0.21" login="admin" passwd="xxxxxxx" power_wait="4s" \
>         op monitor interval="60s" \
>         meta target-role="Started"
> primitive san1vip ocf:heartbeat:IPaddr2 \
>         params ip="10.94.0.101" cidr_netmask="24" \
>         op monitor interval="10s" \
>         meta target-role="Started"
> primitive san2fencer stonith:fence_ipmilan \
>         params pcmk_host_list="cs1san2" lanplus="1" ipaddr="10.96.0.22" login="admin" passwd="xxxxxxxx" power_wait="4s" \
>         op monitor interval="60s" \
>         meta target-role="Started"
> primitive san2vip ocf:heartbeat:IPaddr2 \
>         params ip="10.94.0.102" cidr_netmask="24" \
>         op monitor interval="10s" \
>         meta target-role="Started"
> group cs1ddb1grp cs1ddb1t1 cs1ddb1l1 \
>         meta target-role="Started"
> group cs1dws1grp cs1dws1t1 cs1dws1l1 \
>         meta target-role="Started"
> group cs1lb1grp cs1lb1t1 cs1lb1l1 \
>         meta target-role="Started"
> group cs1lb2grp cs1lb2t1 cs1lb2l1 \
>         meta target-role="Started"
> group cs1man1grp cs1man1t1 cs1man1l1 \
>         meta target-role="Started"
> group cs1master1grp cs1master1t1 cs1master1l1 \
>         meta target-role="Started"
> clone alerts alert \
>         meta target-role="Started"
> clone pings ping \
>         meta target-role="Started"
> location san1fence san1fencer -inf: cs1san1
> location san1loc cs1vg1 \
>         rule $id="san1loc-rule1" 50: #uname eq cs1san1 \
>         rule $id="san1loc-rule2" pingd: defined ping
> location san2fence san2fencer -inf: cs1san2
> location san2loc cs1vg2 \
>         rule $id="san2loc-rule1" 50: #uname eq cs1san2 \
>         rule $id="san2loc-rule2" pingd: defined ping
> colocation san1colo inf: ( cs1lb1grp cs1man1grp cs1master1grp cs1ddb1grp cs1dws1grp ) san1vip cs1vg1
> colocation san2colo inf: ( cs1lb2grp ) san2vip cs1vg2
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         no-quorum-policy="ignore" \
>         last-lrm-refresh="1353428951" \
>         stonith-enabled="true" \
>         maintenance-mode="false"
> ===================================================================
>
> Regards
> Craig
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org