[Pacemaker] Resources don't start on second node after ping fails

Fri Apr 9 07:47:38 UTC 2010

Hi everybody!

I fixed this 'problem'... 
My drbd-resource wasn't connected. m(
The configuration of the ping resource and location were correct. I implemented Marco's advice but I'm sure my solution would've also worked.
The failover works just fine right now.

Thanks for reading!
Benjamin Benz

-----Ursprüngliche Nachricht-----
Von: Benz, Benjamin
Gesendet: Do 08.04.2010 14:46
An: pacemaker at oss.clusterlabs.org
Betreff: [Pacemaker] Resources don't start on second node after ping fails

Hi there!

I've got a problem with the configuration.
I'm using Pacemaker 1.0.7 to move my database from node1 to node2. Everything works fine when I migrate the resources manually or pull out the power plug.
Since I want the database to be available in case of network problems I tried to integrate a ping resource as you can see below.
When I pull out the network cable the resources stop on node1 but don't start on node2.

crm_mon output:

Online: [ bb-node1 bb-node2 ]

 Master/Slave Set: ms_drbd_ora
     Slaves: [ bb-node2 ]
     Stopped: [ drbd_ora:1 ]
 Clone Set: connected
     Started: [ bb-node1 bb-node2 ]

I guess there's something wrong with my configuration of the location but I can't figure it out.
It would be great if someone could help me out!

If you have other helpful hints concerning my config feel free to answer!

Regards
Benjamin Benz

crm configure show:

node $id="d109b732-1cfc-4cd8-9cce-ba9323a56087" bb-node2
node $id="f995b3ac-734f-4cc4-aacb-cbec22e48de5" bb-node1
primitive drbd_ora ocf:linbit:drbd \
	params drbd_resource="ora" \
	op monitor interval="5s" timeout="20s" on-fail="restart"
primitive fs_ora ocf:heartbeat:Filesystem \
	params device="/dev/drbd0" directory="/oracle" fstype="ext3" \
	op monitor interval="5s" timeout="40s" on-fail="restart"
primitive ip_ora ocf:heartbeat:IPaddr2 \
	params ip="53.113.178.29" cidr_netmask="255.255.255.0" \
	op monitor interval="5s" timeout="20s" on-fail="restart"
primitive oracle_ora ocf:heartbeat:oracle \
	params home="/oracle" sid="bbcluster" user="oracle" ipcrm="orauser" \
	op monitor interval="5s" timeout="30s" on-fail="restart"
primitive oralsnr_ora ocf:heartbeat:oralsnr \
	params home="/oracle" sid="bbcluster" user="oracle" \
	op monitor interval="5s" timeout="30s" on-fail="restart"
primitive ping ocf:pacemaker:ping \
	params dampen="5s" host_list="53.118.160.121" multiplier="1000" name="pingval" \
	operations $id="ping-operations" \
	op monitor interval="10s" timeout="10s"
group ora_group fs_ora ip_ora oralsnr_ora oracle_ora \
	meta target-role="Started"
ms ms_drbd_ora drbd_ora \
	meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started"
clone connected ping \
	meta globally-unique="false" target-role="Started"

location ms_drbd_ora_on_connected_node ms_drbd_ora \
	rule $id="ms_drbd_ora_on_connected_node-rule" -inf: not_defined pingval or pingval lte 0

colocation ora_group_on_ms_drbd_ora inf: ora_group ms_drbd_ora:Master
order ms_drbd_ora_before_ora_group inf: ms_drbd_ora:promote ora_group:start
property $id="cib-bootstrap-options" \
	dc-version="1.0.7-6e1815972fc236825bf3658d7f8451d33227d420" \
	cluster-infrastructure="Heartbeat" \
	no-quorum-policy="ignore" \
	stonith-enabled="false" \
	last-lrm-refresh="1270732011"

_______________________________________________
Pacemaker mailing list
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

-------------- n?chster Teil --------------
Ein Dateianhang mit Bin?rdaten wurde abgetrennt...
Dateiname   : nicht verf?gbar
Dateityp    : application/ms-tnef
Dateigr??e  : 4450 bytes
Beschreibung: nicht verf?gbar
URL         : <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20100409/c315a410/attachment.bin>