[Pacemaker] color_instance: Pre-allocation failed

Fri Dec 28 09:21:00 EST 2012

Every 15-18 minutes one of my resources gets stopped on one node and then
is restarted shortly after.

In the DC log I can see the following error lines.

Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh:
Pairing resOCFS:1 with groupOcfs2Mgmt:0
Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: Assigning
app02 to resOCFS:1
Dec 28 15:04:09 app01 pengine: [8618]: ERROR: color_instance:
Pre-allocation failed: got app02 instead of app01
Dec 28 15:04:09 app01 pengine: [8618]: info: native_deallocate:
Deallocating resOCFS:1 from app02
Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh:
Pairing resOCFS:0 with groupOcfs2Mgmt:0
Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: Assigning
app02 to resOCFS:0
Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh:
Pairing resOCFS:1 with groupOcfs2Mgmt:1
Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh:
Pairing resOCFS:1 with groupOcfs2Mgmt:1
Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: All nodes
for resource resOCFS:1 are unavailable, unclean or shutting down (app01: 1,
-1000000)
Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: Could not
allocate a node for resOCFS:1
Dec 28 15:04:09 app01 pengine: [8618]: info: native_color: Resource
resOCFS:1 cannot run anywhere

This plays out before every stop event of OCFS.

Here is the cib.

primitive VirtualIP0 ocf:heartbeat:IPaddr2 \
        params ip="10.121.12.30" \
        op monitor interval="10s" \
        meta target-role="Started"
primitive resDLM ocf:pacemaker:controld
primitive resDrbdShared0 ocf:linbit:drbd \
        params drbd_resource="shared0" \
        operations $id="resDrbd-operations" \
        op monitor interval="20" role="Master" timeout="20" notify="true" \
        op monitor interval="30" role="Slave" timeout="20" notify="true"
primitive resJboss lsb:jboss4 \
        op monitor interval="120s" timeout="150s" \
        op start interval="0" timeout="150s" \
        op stop interval="0" timeout="150s"
primitive resO2CB ocf:pacemaker:o2cb
primitive resOCFS ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/shared0" directory="/data"
fstype="ocfs2" \
        op monitor interval="120s" timeout="40" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60"
group groupOcfs2Mgmt resDLM resO2CB
ms msDrbdShared0 resDrbdShared0 \
        meta resource-stickines="100" notify="true" interleave="true"
master-max="2" target-role="Started"
clone cloneJboss resJboss \
        meta interleave="true" ordered="true" is-managed="false"
target-role="Started"
clone cloneOCFS resOCFS \
        meta interleave="true" ordered="true" target-role="Started"
is-managed="true"
clone cloneOcfs2Mgmt groupOcfs2Mgmt \
        meta interleave="true" target-role="Started"
location locVirtualIP0 VirtualIP0 9001: app01
colocation colDRBD inf: cloneOcfs2Mgmt msDrbdShared0:Master
colocation colOcfs2 inf: cloneOCFS cloneOcfs2Mgmt
order ordDRBD inf: msDrbdShared0:promote cloneOcfs2Mgmt:start
order ordOcfs2 inf: cloneOcfs2Mgmt:start cloneOCFS:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1356702541"
rsc_defaults $id="rsc-options" \
        resource-stickiness="0"
op_defaults $id="op-options" \
        timeout="20s"

I first suspected wrong network name resolution but /etc/hosts is correct
and no duplicate names.

-- 
Hälsningar / Greetings

Stefan Midjich
[De omnibus dubitandum]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20121228/4a499149/attachment-0002.html>