[Pacemaker] colocation conundrum

Tue Nov 20 19:38:29 UTC 2012

----- Original Message -----
> From: "Craig Donnelly" <craig at goaf.net>
> To: "Jake Smith" <jsmith at argotec.com>, "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Tuesday, November 20, 2012 2:28:23 PM
> Subject: Re: [Pacemaker] colocation conundrum
> 
> Hi Jake,
> 
> Thanks for the lightning response :)
> 
> Ahh yes apologies, the config I uploaded  is a version where I took
> the order statements out to see what impact that had.
> I have used a config with the order statements exactly as you
> specified, namely:
> 
> colocation san1order inf: cs1vg1 san1vip ( cs1lb1grp cs1man1grp
> cs1master1grp cs1ddb1grp cs1dws1grp )
> colocation san2order inf: cs1vg2 san2vip ( cs1lb2grp )

I assume "colocation" above is a typo ;-)

> 
> but the issue remains :(

Unfortunately I don't have another other thoughts at the moment.

> 
> Cheers
> Craig
> 
> 
> On 20 Nov 2012, at 19:16, Jake Smith wrote:
> 
> > 
> > 
> > 
> > ----- Original Message -----
> >> From: "Craig Donnelly" <craig at goaf.net>
> >> To: pacemaker at oss.clusterlabs.org
> >> Sent: Tuesday, November 20, 2012 1:56:03 PM
> >> Subject: [Pacemaker] colocation conundrum
> >> 
> >> Hi there,
> >> 
> >> I think Ive exhausted everything I can find online in terms of
> >> trying
> >> to solve my problem so here goes with a posting to see if anyone
> >> on
> >> this mailing list might be able to help please.
> >> 
> >> I have a pacemaker1.1.7/corosync 1.4.1  two node cluster running
> >> on
> >> CentOS 6.3.
> >> Im using this cluster to support shared storage using a
> >> combination
> >> of LVM and iSCSI.
> >> 
> >> Now failover works fine if I offline/stonith a node. However when
> >> I
> >> bring the node back online they enter a death-match situation.
> >> I see the issue as being with ordering/colocation/resource sets
> >> and I
> >> have tried a bunch of different variations and read and re-read
> >> all
> >> the information I can find online without resolution.
> >> 
> >> Would really appreciate any help/advise.
> >> 
> >> The key entries that I can see in the logs are:
> >> 
> >> NODE1:
> >> ======
> >> Nov 20 12:16:38 cs1san1 iSCSILogicalUnit(cs1lb1l1)[2710]: ERROR:
> >> tgtadm: invalid request
> >> Nov 20 12:16:39 cs1san1 iSCSILogicalUnit(cs1man1l1)[2807]: ERROR:
> >> tgtadm: invalid request
> >> Nov 20 12:22:55 cs1san1 iSCSILogicalUnit(cs1master1l1)[4482]:
> >> ERROR:
> >> tgtadm: invalid request
> >> Nov 20 12:23:17 cs1san1 iSCSILogicalUnit(cs1ddb1l1)[4968]: ERROR:
> >> tgtadm: invalid request
> >> Nov 20 12:23:18 cs1san1 iSCSILogicalUnit(cs1master1l1)[5081]:
> >> ERROR:
> >> tgtadm: invalid request
> >> Nov 20 12:30:28 cs1san1 iSCSILogicalUnit(cs1lb1l1)[2670]: ERROR:
> >> tgtadm: invalid request
> >> 
> >> NODE2:
> >> ======
> >> Nov 20 12:16:38 cs1san2 LVM(cs1vg1)[22039]: ERROR: Can't
> >> deactivate
> >> volume group "cs1vg1" with 3 open logical volume(s)
> >> Nov 20 12:22:55 cs1san2 LVM(cs1vg1)[3386]: ERROR: Can't deactivate
> >> volume group "cs1vg1" with 2 open logical volume(s)
> >> Nov 20 12:23:17 cs1san2 LVM(cs1vg1)[4296]: ERROR: Can't deactivate
> >> volume group "cs1vg1" with 1 open logical volume(s)
> >> Nov 20 12:30:27 cs1san2 LVM(cs1vg1)[14943]: ERROR: Can't
> >> deactivate
> >> volume group "cs1vg1" with 4 open logical volume(s)
> >> 
> >> which, to me, clearly indicates an ordering issue yet the
> >> configuration I have follows the colocation/ordering rules in as
> >> much as I can understand them.
> >> 
> >> My "current" CRM config is as follows:
> >> ==============================================================================
> >> node cs1san1 \
> >> 	attributes standby="off"
> >> node cs1san2 \
> >> 	attributes standby="off"
> >> primitive alert ocf:heartbeat:MailTo \
> >> 	params email="ops at xyz.com" subject="CS takeover event" \
> >> 	op monitor interval="10s"
> >> primitive cs1ddb1l1 ocf:heartbeat:iSCSILogicalUnit \
> >> 	params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1ddb1d1" lun="1"
> >> 	path="/dev/cs1vg1/cs1ddb1d1" \
> >> 	op monitor interval="10" timeout="15"
> >> primitive cs1ddb1t1 ocf:heartbeat:iSCSITarget \
> >> 	params iqn="iqn.2012-10.com.xyz.cs1san1:cs1ddb1d1" tid="7" \
> >> 	op monitor interval="10" timeout="15"
> >> primitive cs1dws1l1 ocf:heartbeat:iSCSILogicalUnit \
> >> 	params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1dws1d1" lun="1"
> >> 	path="/dev/cs1vg1/cs1dws1d1" \
> >> 	op monitor interval="10" timeout="15"
> >> primitive cs1dws1t1 ocf:heartbeat:iSCSITarget \
> >> 	params iqn="iqn.2012-10.com.xyz.cs1san1:cs1dws1d1" tid="8" \
> >> 	op monitor interval="10" timeout="15"
> >> primitive cs1lb1l1 ocf:heartbeat:iSCSILogicalUnit \
> >> 	params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1lb1d1" lun="1"
> >> 	path="/dev/cs1vg1/cs1lb1d1" \
> >> 	op start interval="0" timeout="15" \
> >> 	op stop interval="0" timeout="15" \
> >> 	op monitor interval="10" timeout="15" \
> >> 	meta is-managed="true"
> >> primitive cs1lb1t1 ocf:heartbeat:iSCSITarget \
> >> 	params iqn="iqn.2012-10.com.xyz.cs1san1:cs1lb1d1" tid="1" \
> >> 	op monitor interval="10" timeout="15"
> >> primitive cs1lb2l1 ocf:heartbeat:iSCSILogicalUnit \
> >> 	params target_iqn="iqn.2012-10.com.xyz.cs1san2:cs1lb2d1" lun="1"
> >> 	path="/dev/cs1vg2/cs1lb2d1" \
> >> 	op start interval="0" timeout="15" \
> >> 	op stop interval="0" timeout="15" \
> >> 	op monitor interval="10" timeout="15"
> >> primitive cs1lb2t1 ocf:heartbeat:iSCSITarget \
> >> 	params iqn="iqn.2012-10.com.xyz.cs1san2:cs1lb2d1" tid="2" \
> >> 	op monitor interval="10" timeout="15"
> >> primitive cs1man1l1 ocf:heartbeat:iSCSILogicalUnit \
> >> 	params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1man1d1" lun="1"
> >> 	path="/dev/cs1vg1/cs1man1d1" \
> >> 	op monitor interval="10" timeout="15"
> >> primitive cs1man1t1 ocf:heartbeat:iSCSITarget \
> >> 	params iqn="iqn.2012-10.com.xyz.cs1san1:cs1man1d1" tid="5" \
> >> 	op monitor interval="10" timeout="15"
> >> primitive cs1master1l1 ocf:heartbeat:iSCSILogicalUnit \
> >> 	params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1master1d1"
> >> 	lun="1"
> >> 	path="/dev/cs1vg1/cs1master1d1" \
> >> 	op monitor interval="10" timeout="15"
> >> primitive cs1master1t1 ocf:heartbeat:iSCSITarget \
> >> 	params iqn="iqn.2012-10.com.xyz.cs1san1:cs1master1d1" tid="6" \
> >> 	op monitor interval="10" timeout="15"
> >> primitive cs1vg1 ocf:heartbeat:LVM \
> >> 	params exclusive="true" volgrpname="cs1vg1" \
> >> 	op start interval="0" timeout="30s" \
> >> 	op stop interval="0" timeout="30s" \
> >> 	meta target-role="Started"
> >> primitive cs1vg2 ocf:heartbeat:LVM \
> >> 	params exclusive="true" volgrpname="cs1vg2" \
> >> 	op start interval="0" timeout="30s" \
> >> 	op stop interval="0" timeout="30s" \
> >> 	meta target-role="Started"
> >> primitive ping ocf:pacemaker:ping \
> >> 	params host_list="10.96.0.1 10.96.0.2" attempts="3" timeout="2s"
> >> 	multiplier="100" dampen="5s" \
> >> 	op monitor interval="10s"
> >> primitive san1fencer stonith:fence_ipmilan \
> >> 	params pcmk_host_list="cs1san1" lanplus="1" ipaddr="10.96.0.21"
> >> 	login="admin" passwd="xxxxxxx" power_wait="4s" \
> >> 	op monitor interval="60s" \
> >> 	meta target-role="Started"
> >> primitive san1vip ocf:heartbeat:IPaddr2 \
> >> 	params ip="10.94.0.101" cidr_netmask="24" \
> >> 	op monitor interval="10s" \
> >> 	meta target-role="Started"
> >> primitive san2fencer stonith:fence_ipmilan \
> >> 	params pcmk_host_list="cs1san2" lanplus="1" ipaddr="10.96.0.22"
> >> 	login="admin" passwd="xxxxxxxx" power_wait="4s" \
> >> 	op monitor interval="60s" \
> >> 	meta target-role="Started"
> >> primitive san2vip ocf:heartbeat:IPaddr2 \
> >> 	params ip="10.94.0.102" cidr_netmask="24" \
> >> 	op monitor interval="10s" \
> >> 	meta target-role="Started"
> >> group cs1ddb1grp cs1ddb1t1 cs1ddb1l1 \
> >> 	meta target-role="Started"
> >> group cs1dws1grp cs1dws1t1 cs1dws1l1 \
> >> 	meta target-role="Started"
> >> group cs1lb1grp cs1lb1t1 cs1lb1l1 \
> >> 	meta target-role="Started"
> >> group cs1lb2grp cs1lb2t1 cs1lb2l1 \
> >> 	meta target-role="Started"
> >> group cs1man1grp cs1man1t1 cs1man1l1 \
> >> 	meta target-role="Started"
> >> group cs1master1grp cs1master1t1 cs1master1l1 \
> >> 	meta target-role="Started"
> >> clone alerts alert \
> >> 	meta target-role="Started"
> >> clone pings ping \
> >> 	meta target-role="Started"
> >> location san1fence san1fencer -inf: cs1san1
> >> location san1loc cs1vg1 \
> >> 	rule $id="san1loc-rule1" 50: #uname eq cs1san1 \
> >> 	rule $id="san1loc-rule2" pingd: defined ping
> >> location san2fence san2fencer -inf: cs1san2
> >> location san2loc cs1vg2 \
> >> 	rule $id="san2loc-rule1" 50: #uname eq cs1san2 \
> >> 	rule $id="san2loc-rule2" pingd: defined ping
> >> colocation san1colo inf: ( cs1lb1grp cs1man1grp cs1master1grp
> >> cs1ddb1grp cs1dws1grp ) san1vip cs1vg1
> >> colocation san2colo inf: ( cs1lb2grp ) san2vip cs1vg2
> > 
> > First my disclaimer - I don't use pacemaker for iSCSI so I'm not
> > sure about *correct* ordering for iSCSI.
> > 
> > But after quick glance it looks like you are missing the ordering
> > statements that coincide with your colocation statements.
> > Something like this I would assume:
> > order san1order inf: cs1vg1 san1vip ( cs1lb1grp cs1man1grp
> > cs1master1grp cs1ddb1grp cs1dws1grp )
> > order san2order inf: cs1vg2 san2vip ( cs1lb2grp )
> > 
> > HTH
> > 
> > Jake
> > 
> >> property $id="cib-bootstrap-options" \
> >> 	dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14"
> >> 	\
> >> 	cluster-infrastructure="openais" \
> >> 	expected-quorum-votes="2" \
> >> 	no-quorum-policy="ignore" \
> >> 	last-lrm-refresh="1353428951" \
> >> 	stonith-enabled="true" \
> >> 	maintenance-mode="false"
> >> ===================================================================
> >> 
> >> Regards
> >> Craig
> >> 
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> 
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >> 
> >> 
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
>